[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Minh Hon Chau Tue, 26 Apr 2016 23:58:51 -0700

I have attached prototype patch that apply the idea of resuming sg fsm state, 
it needs patch of #1723


The patch can work for the case that ongoing csi callback returns after 
recovery, and without failover happens during headless, in most of 
lock/unlock/shutdown su/si/sg. There are still some edge cases that are 
ambigous to map from HA/assignment to SG fsm state, the best way would be 
memorize sg fsm attribuite to IMM.

If this case - ongoing csi callback return after recovery can be solved - then 
the other case that complete csi callback can be solve in such a way that amfnd 
re-send the su_si report to amfd after headless, which were not able to send 
during headless.

However, within resuming sg fsm state, if a node that reboots or su failover 
during headless, then after resuming sg fsm state, the sg fsm code still does 
not work, because one of assignments of other SU was deleted and the current sg 
fsm code are not supposed to handle this situation.

The next step could be trying to reuse node_fail() or will have to adjust the 
transient states as delayed failover

Any ideas please?


Attachments:

- 
[1620_12_amfd_adjust_ongoing_susi.diff](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/7b203666/5185/attachment/1620_12_amfd_adjust_ongoing_susi.diff)
 (28.0 kB; text/x-patch)


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** accepted
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Fri Apr 22, 2016 11:08 AM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, then 
cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to