[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Praveen Tue, 23 Aug 2016 04:29:35 -0700

Hi Minh,
I am going through the patches 1725_phase1.tgz. Some initial comments:
1) In patch 2 avnd_diq_rec_send_buffered_msg() checks presence of SUSI then 
only it sends buffered message to AMFD.  In case removal of assignments 
completes during headless , AMFND deletes the SUSIs in su_si_oper_done(). So 
AMFND will never send the assignment message and admin operation will not 
continue.
2) In patch1, I think after headless we will not get any invocation id for the 
admin operation that
was going on before headless.  Since AMF is continuing the admin operation we 
should somehow
restrict other admin operation to start by setting some magic no for 
invocationid or any other way.
3)If suswitch is in TOGGLED state then I think we should crosscheck that there 
are atleast two SUs
having assignment. The reason is if this flag remains TOGGOLED and admin op 
does not continue then there is very less probability that if will get reset as 
it is used only in si-swap flow.
4)Since assignments are in progress. This could be because of admin operation or
faults. AMFD should call one function here like log_admin_op(). This function 
will search the entity
that is being under admin operation and log details like:
-After headless state admin op on '%s' is continuing in syslog.
-Also traces for susi states which are not assigned.


Thanks,
Praveen




---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** review
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Tue Aug 23, 2016 11:11 AM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, or 
any payloads can be hard rebooted/powered off by operator, then cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to