[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Minh Hon Chau Thu, 11 Aug 2016 05:12:30 -0700

Hi Praveen,

There should be a description on second patch of part 1 that I miss to do.


In case 1), AMFD can self trigger susi_success() to continue the admin 
operation. However, AMFD needs to know the third parameter of susi_success(), 
which is NULL or existing SUSI. These 2 cases come from admin operation on SU 
or SI previously. Therfore AMFD needs a differentiation here. Another thing, 
the su_si message also carries operation of csi level, which AMFD needs more 
exploration of its CSI assignment collected after headless, in order to 
continue the operation by self triggering. 

This part 1 is for all normal cases. There should be a time gap between AMFD 
sends out the su_si event and AMFND responds this su_si event to AMFD. In 
non-headless, this time gap should be very trivially small. In headless, the 
gap is assumed just a bigger time. In headless, the case 1) can be first seen 
as AMFND retries many times to send back su_si event until AMFD back, so in 
other words, AMFND can also buffer it and resend later after AMFD restores all 
previous states from IMM. The su_si event contains a comprehensive information 
that AMFD needs to resume the operation.

During headless, If there was an error before SUSI assignment completes, this 
su_si response event won't be buffered. The other case, that SUSI assignment 
completes then an error happen. There will be no SUSI to be sent in state_info 
recovery message, AMFD reads the SUSI fsm states from IMM, AMFD will see this 
SUSI in MODIFY/UNASGN/ASGN, AMFD will then receives recovery request buffered 
in AMFND. Under AMFD's view, this situation becomes an error happening while 
AMFND's assignment is ongoing in non-headless. In 
avnd_diq_rec_send_buffered_msg(), there is "#if 0" which intends not sending 
su_si response event on already-removed SUSI. Part 2 is for error cases under 
with/without admin operation circumtances, I'm still working on it so there may 
be minor changes in avnd_diq_rec_send_buffered_msg(). 

Overall, in case 1 part 1, I think buffering su_si event can save AMFD adding 
more headless code, additional field in state_info message. This is my thought 
for now, if you see its weakness, please give me advice.

Thanks,
Minh


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** review
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Thu Aug 11, 2016 11:00 AM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to