[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Praveen Thu, 11 Aug 2016 05:42:23 -0700

Hi Minh.

I understand that working with SUSI states deduced from AMFND required some 
code addition at AMFD as they will be updated states and SG FSM needs to be 
adjusted for that.
Since we are now reading from IMM, we have to be ready for any missing SUSI 
information from IMM when AMFDs dies before updating to IMM. During 
node-failover discussion we have discussed that if we do not get complete SUSIs 
for the failed node from IMM then what is supposed to be done for such an SI.In 
such a case removing that SI assignments from all the SUs will be the only 
choice.


Regarding the other case of assignment response and then comp fault. In case of 
comp-failover recovey, AMFND cannot delete the SUSIs itself. It required AMFD 
help. So if a comp faults after responding for quiesced/quiescing assignments 
during headless state with comp-failover recovery, then AMFND will have to 
buffer this assignment response as well as recovery request. Here AMFND cannot 
delete assignments until it gets removal of assignments from AMFD. So AMFND 
will have to send both assignments completion message and recovery request 
message to AMFD. This is for 2N model without SI dep. In case SI dep is 
configured then AMFND may be processing assignments for multiple dependent SI 
simultaneosly. So here also assignment responses and any recovery needs to be 
buffered simultaneously. Since, at present, only "Restart or Reboot" is 
supported then it can be revisited when other escalations are supported.

Thanks,
Praveen


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** review
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Thu Aug 11, 2016 12:12 PM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to