[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Srikanth R Mon, 20 Jun 2016 22:11:44 -0700

For a fault during headless, AMF is leaving the application in the same state 
with the following update in syslog on SU hosted payload.


Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI1,safApp=TestApp_TwoN
Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI2,safApp=TestApp_TwoN
Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI3,safApp=TestApp_TwoN
Sep  7 19:38:39 SCALE_SLOT-94 osafamfnd[5104]: CR SU-SI record addition failed, 
SU= safSu=TestApp_SU2,safSg=TestApp_SG1,safApp=TestApp_TwoN : 
SI=safSi=TestApp_SI4,safApp=TestApp_TwoN


  In the above situation, application with active assignment faulted during 
headless and node went for reboot. Once the controller joins , the above syslog 
is printed and the application is left  with ONLY standby assignment.
  
   If AMF application is left with improper assignments  and this ticket is 
targeting the above scenario and others like #1869, then this ticket should be 
marked as **defect**. 


---

** [tickets:#1725] AMF: Recover transient SUSIs left over from headless**

**Status:** accepted
**Milestone:** 5.1.FC
**Created:** Wed Apr 06, 2016 07:16 AM UTC by Minh Hon Chau
**Last Updated:** Thu May 05, 2016 12:22 PM UTC
**Owner:** Minh Hon Chau


This ticket is more likely an enhancement that targets on how AMFD detect and 
recover the transients SUSI left over from headless. There are three major 
situations:
(1) - Cluster goes headless, su/node failover on any payloads can happen, then 
cluster recover
(2) - issue admin op on any AMF entities, cluster goes headless. During 
headless, the middle HA assignments of whole admin op sequence between AMFND 
and components could be:
    (2.1) The assignment completes, component returns OK with csi callback, 
then cluster recover
    (2.2) The assignment is under going, then cluster recover. The assignment 
afterward could complete, or csi callback returns FAILED_OPERATION or error can 
also happen
    
At the time cluster recover, amfd has collected all assignments from all 
amfnd(s). These assignments can be in assigned or assigning states whilst its 
HA states do not conform its SG redundancy. Any of (1) (2.1) (2.2) can happen 
in a combination, which means while issuing admin op (2), cluster go headless 
and any kinds of failover (1) can happen during headless.  



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape

_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

[tickets] [opensaf:tickets] #1725 AMF: Recover transient SUSIs left over from headless

Reply via email to