Hi Minh,
As mentioned in this ticket, I had agreed for both loss of RTA and loss of 
MBCSv checkpint can lead to problems.  But RTA losses may show only wrong 
states to a user and will affect normal cluster only after SC-Absence is 
observed. I did not get how loss of RTA affects normal cluster when SC absence 
feature is disabled. Does AMFD reads any runtime data from IMM during failover 
situation? 
MBCSv related loss can occur with any director and the problem needs a general 
approach to be discussed involving other service owners. 
If  this ticket is aimed to fix only this problem then I think it can pushed in 
other branches also. Approach in patch 2210_v2.diff seems ok to me. Don't you 
think, approach taken by you in 2210_v2.diff goes consistently with the 
approach  you have worked on #1725?  In #1725 both in admin op and recovery 
from faults, AMF continues them. So I was questioning  the use of su_fault() 
and reverting the admin op based on this only.

Thanks,
Praveen







---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Wed Feb 01, 2017 04:54 AM UTC
**Owner:** Minh Hon Chau


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x00000000004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
        s_susi = 0x8f50000000b
        susi_temp = 0x5fa169
        o_su = 0x2417f98
        __FUNCTION__ = "node_fail_si_oper"
        cb = 0x919240 <_control_block>
#1  0x00000000004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
        a_susi = 0x1
        s_susi = 0x7fffedecd2d0
        o_su = 0x5a50bd <AVD_SU::any_susi_fsm_in(unsigned int)+497>
        flag = 2
        __FUNCTION__ = "node_fail"
        su_ha_state = 0
#2  0x0000000000513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
        su = @0x2411330: 0x2413440
        __for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
        __for_begin = 
        __for_end = 
        __FUNCTION__ = "failover_absent_assignment"
#3  0x000000000043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
        i_sg = 0x24109d0
        it = {first = "safSg=1,safApp=osaftest", second = }
        __FUNCTION__ = "avd_cluster_tmr_init_evh"
        su = 0x0
        node = 0x240f9b0
~~~



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to