Hi Minh,
Sorry for the late reply. Somehow updating this ticket does not sends email.
Regrading the limitations: Since #1725 was pushed in 5.1, so these limitations
are valid in 5.1 also. With that understanding only I stated that this ticket
should be an enhancement.
Loss of MBCSv checkpointing and RTA can occur only when controllers are up
irrespective of SC-Absent feature is enabled or not. And we should treat both
of them as separate issues. MBCSV loss may be a problem in faiover/switchover
situation in normal cluster. Loss of RTA is not a problem in normal cluster as
it is for user only. but in SC-Absent cluster it becomes a problem after
recovery. This issue is for loss of RTA.
**Normal cluster**: We have been solving RTA related issues during failover
situation (#1141, #2009 etc) around this area, currently enhancement #2252 is
for that purpose only. Regarding MBCSv chckpointing loss, I guess this is the
first time such issue being reported. In case of controller swichover or
failover, we are already continuing admin operation now also with only
difference AMFD does not reply to client as opId becomes invalid. In future
also we will have to continue this way even if there is some missing MBCSv
update, becuase IMM spec has features like admin operation continuation (not
supported now). I think, currently, this can be raised as a discussion point
along with same traces in a separate ticket: How to handle application when
there is a loss in MBCSv state from active director to standby director? First
thing how standby director will know that there is some MBCSV ckpt got missed.
**SC-Absent cluster**: Since AMF recovers SUSIs from IMM, here loss of RTA
imposes a major challange. Loss of MBCSV may impose some challange indirectly
because controllers may fail in sequence and not simultaneously. I think the
documentation ticket is itself a proof that we knew about this limitattion of
readling from IMM. So we need to decide whether we want to fix this small issue
reported in this ticket or we want to develop a full solution around this RTA
loss area? In any case, I think we cannot give up the approach taken already in
#1725 i .e to continue admin op or recovery if SCs are not available for some
duration. So we should not go to the approach of reverting anything.
Thanks,
Praveen
---
** [tickets:#2210] AMFD: Loss of RT attribute update before headless**
**Status:** accepted
**Milestone:** 5.2.FC
**Created:** Mon Nov 28, 2016 10:18 PM UTC by Minh Hon Chau
**Last Updated:** Tue Jan 24, 2017 12:25 PM UTC
**Owner:** Minh Hon Chau
A loss of IMM RT saAmfSIAdminState update in AMFD has been seen just before
cluster goes headless. It results in coredump after headless.
One scenario is:
- Issue amf-admin shutdown SI, delay csi quiescing callback
- Stop SCs, release csi quiescing callback
- Restart SCs
Observation: the saAmfSIAdminState is read as UNLOCKED while related SUSI was
QUIESCED, and coredump as below
~~~
Thread 1 (Thread 0x7fec174a0780 (LWP 493)):
#0 0x00000000004fbfd5 in SG_2N::node_fail_si_oper (this=0x24109d0,
su=0x2413440) at sg_2n_fsm.cc:3102
s_susi = 0x8f50000000b
susi_temp = 0x5fa169
o_su = 0x2417f98
__FUNCTION__ = "node_fail_si_oper"
cb = 0x919240 <_control_block>
#1 0x00000000004fe69c in SG_2N::node_fail (this=0x24109d0, cb=0x919240
<_control_block>, su=0x2413440) at sg_2n_fsm.cc:
3469
a_susi = 0x1
s_susi = 0x7fffedecd2d0
o_su = 0x5a50bd <AVD_SU::any_susi_fsm_in(unsigned int)+497>
flag = 2
__FUNCTION__ = "node_fail"
su_ha_state = 0
#2 0x0000000000513010 in AVD_SG::failover_absent_assignment (this=0x24109d0)
at sg.cc:2273
su = @0x2411330: 0x2413440
__for_range = std::vector of length 2, capacity 2 = {0x2413440,
0x24111e0}
__for_begin =
__for_end =
__FUNCTION__ = "failover_absent_assignment"
#3 0x000000000043be65 in avd_cluster_tmr_init_evh (cb=0x919240
<_control_block>, evt=0x7fec04000df0) at cluster.cc:103
i_sg = 0x24109d0
it = {first = "safSg=1,safApp=osaftest", second = }
__FUNCTION__ = "avd_cluster_tmr_init_evh"
su = 0x0
node = 0x240f9b0
~~~
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets