Hi Minh,
AMFD is not required to read any RTA if SC-Absent feature is disabled. RTA is
for user only. With SC-Absent feature disabled, if AMFD is reading RTA then
that is a bug and it needs to be fixed. AMFD should read only config data from
IMM if SC-Absent feature is disabled. For all runtime attributes and objects
AMFD should depend on its own database when SC-Absent feature is disabled. When
SC-Absent feature is enabled then also, AMFD should read RTA only once when
first active AMFD is coming up.
In ticket #2228, I think issue is not MBCSV loss but it may be either an async
update comes at standby during COLD sync phase and gets ignored or active dies
before updating SU states to IMM. If it is the first case it can be fixed by
not ignoring it. In the second case, it affects only user and AMFD has correct
states in it database.
Loss of large number of RTAs can surely happen and it will make recovery after
headless state impossbile in the current implemenration. This is properly
documented as of now. Please raise a separate enhancement ticket for RTA loss
problem for 5.3. All these approaches should be discussed under that ticket.
This tikcet fixes the case highlighted in the description by adding a small
change and thus comes into a defect category.
Thanks,
Praveen
---
** [tickets:#2210] AMFD: Loss of RT attribute update before headless**
**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Thu Feb 02, 2017 04:58 AM UTC
**Owner:** Minh Hon Chau
A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before
cluster goes headless. It results in coredump after headless.
One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was
QUIESCED, and coredump as below
~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0 0x00000000004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0,
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f50000000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1 0x00000000004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd <AVD_SU::any_susi_fsm_in(unsigned int)+497>
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2 0x0000000000513010 in AVD_SG::failover_absent_assignment (this=0x24109d0)
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440,
0x24111e0}
__for_begin =
__for_end =
__FUNCTION__ = "failover_absent_assignment"
#3 0x000000000043be65 in avd_cluster_tmr_init_evh (cb=0x919240
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets