Hi Praveen,

I'm ok whether it is as an enhancement or a defect.
The issue here in fact it is a loss of any IMM RTA/checkpointed-variable. For 
nomal cluster, they are checkpointed via MBCSv. For SC absence cluster, they 
are writen to IMM. In general, it is a common issue for cluster in both states, 
where any of following attribute/variable is lost at the standby amfd 
(failover) or new active amfd (headless):

- Entity Admin State (saAmfSIAdminState, saAmfSUAdminState, ...)
- SG Fsm State: it's checkpointed via AVSV_CKPT_SG_FSM_STATE, and writen to IMM 
as osafAmfSGFsmState
- SUSI info: it's checkpointed via AVSV_CKPT_AVD_SI_ASS, and writen to IMM as 
osafAmfSISUFsmState, saAmfSISUHAState
- SU operation list: checkpointed via AVSV_CKPT_AVD_SG_OPER_SU, and writen to 
IMM as osafAmfSGSuOperationList


Either a mbcsv gets lost at standby amfd (failover) or a loss of RTA update to 
IMM (for headless), the same issue will happen at new active amfd where SUSIs 
can not be processed further. Do you agree? Though this ticket is only looking 
from headless side.
It is common issue for both normal and SC absence cluster just in a specific 
sub-set of states that are lost, which are in relation to SUSI assignment.

It looks like you don't prefer to use su_fault() to remove assignment of 
effected SUs due to loss of RTA. And by somehow amfd should detect the loss and 
move to the right states so that operations are not reverted, for example: If 
amfd finds a QUIESCED assignment, and sg fsm state is AVD_SG_FSM_SI_OPER, then 
the saAmfSIAdminState should be self-corrected to LOCKED. Is this the solution 
you are thinking of?

Thanks,
Minh


---

** [tickets:#2210] AMFD: Loss of RT attribute update before headless**

**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Mon Jan 30, 2017 08:20 AM UTC
**Owner:** Minh Hon Chau


A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before 
cluster goes headless. It results in coredump after headless.

One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was 
QUIESCED, and coredump as below

~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0  0x00000000004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0, 
su=0x2413440) at sg_2n_fsm.cc:3102
        s_susi = 0x8f50000000b
        susi_temp = 0x5fa169
        o_su = 0x2417f98
        __FUNCTION__ = "node_fail_si_oper"
        cb = 0x919240 <_control_block>
#1  0x00000000004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240 
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
        a_susi = 0x1
        s_susi = 0x7fffedecd2d0
        o_su = 0x5a50bd <AVD_SU::any_susi_fsm_in(unsigned int)+497>
        flag = 2
        __FUNCTION__ = "node_fail"
        su_ha_state = 0
#2  0x0000000000513010 in AVD_SG::failover_absent_assignment (this=0x24109d0) 
at sg.cc:2273
        su = @0x2411330: 0x2413440
        __for_range = std::vector of length 2, capacity 2 = {0x2413440, 
0x24111e0}
        __for_begin = 
        __for_end = 
        __FUNCTION__ = "failover_absent_assignment"
#3  0x000000000043be65 in avd_cluster_tmr_init_evh (cb=0x919240 
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
        i_sg = 0x24109d0
        it = {first = "safSg=1,safApp=osaftest", second = }
        __FUNCTION__ = "avd_cluster_tmr_init_evh"
        su = 0x0
        node = 0x240f9b0
~~~



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to