[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Minh Hon Chau Thu, 01 Sep 2016 02:48:05 -0700

I copy the implementation phases as mentioned in prior comments

@1. Admin op continuation without required recovery on faults during headless
@1.a) All CSI(s) callback completes during headless, but SUSI states are still 
QUIESCED/QUIESCING
@1.b) One of CSI(s) callback is still ongoing after headless (AMFD would have 
to wait for it?)


@2. Recovery on faults. (Doing fault recovery needs to consider admin op 
continuation which would have been implemented in step @1)
Need #1902
@2.a.) Faults in normal flow: No admin op continuation is required after 
headless, but fault did happen during headless
@2.b.) Faults happen during admin operation while headless, after headless AMFD 
needs to consider a recovery on fault together with admin op continuation.

@3. @1 + @2 + With SI Dep.

So:
phase @1 above has been included in 5.1 FC
phase @2 has been sent out for reviewed but has to be postponed since 5.1 FC 
deadline
phase @3 is all about SI DEP


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Thu Sep 01, 2016 09:34 AM UTC
**Owner:** nobody


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to