[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Minh Hon Chau Tue, 23 Aug 2016 04:11:56 -0700

Yes, these changes have been made recently. It's commented in the floated patch 
for review (on 18/08/2016)
Just copy it here in case something wrong with the review requests email


"
If there's an admin operation running and at that time cluster goes into
headless stage, the normal admin operation sequence is interrupted. Since
both SCs are down, the SI assignments at AMFND could be on going or
completed during headless period. After headless this admin operation
should be continued. This patch series supports the admin operation
continuation after headless.

To resume the admin operation after headless, the states need to be
restored are: SUSI fsm states, SG fsm states, SI Dependency states (not
suppported in this patch), SU Switch toggle, and SU operation list in SG
at the time cluster goes headless.

At this moment, the SG fsm states are set variously in each specific SG
models. Also, the rule that a SU to be added in SG's operation list is not
consistent. A SU is added to operation list after AMFD sends su_si_assign
event on this SU in most of the places. However, there're are some scenarios
that a SU is added to the list for other purposes (failover).
These difficulties make the state deduction logic hard to implemenent.

This patch introduces new RTA states: osafAmfSGSuOperationList,
osafAmfSGFsmState, osafAmfSISUFsmState and osafAmfSUSwitch to capture the
SU operation list of SG, SG fsm state, SUSI fsm state, and SU Switch of
AMFD memory to IMM during AMFD lifetime. When cluster comes back from
headless, these RTA will read from IMM to restore states in AMFD's memory.
It also adds additional field in state_info (headless synchronization)
message which indicates current SUSI fsm states. Both of SUSI fsm states
help to validate the new RTA states read from IMM after headless. Example:
if IMM SUSI fsm state is ASGN, synced SUSI fsm state is ASGND, then HA state
must be ACTIVE or STANDBY. Such validation is indeed neccessary since headless
interruption is unplanned and the recovery heavily depends on RTA read
from IMM.
"



---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** review
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Tue Aug 23, 2016 11:04 AM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to