- **status**: assigned --> review
---
** [tickets:#3162] amfnd: unexpected reboot after headless due to split-brain**
**Status:** review
**Milestone:** 5.20.05
**Created:** Tue Mar 03, 2020 05:35 AM UTC by Thuan Tran
**Last Updated:** Tue Mar 03, 2020 05:35 AM UTC
**Owner:** Thuan Tran
After split-brain (headless enable), node get unexpected reboot order.
~~~
2020-03-02 08:07:04.470 PL-4 osafimmnd[150]: ER Sync-verify: Established node
has different Continuation count (50) than expected (51) - aborting
2020-03-02 08:07:04.497 PL-4 osafamfnd[161]: NO
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation timer
started (timeout: 60000000000 ns)
2020-03-02 08:07:04.497 PL-4 osafamfnd[161]: NO Restarting a component of
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
2020-03-02 08:07:04.497 PL-4 osafamfnd[161]: NO
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
2020-03-02 08:07:04.630 PL-4 osafckptnd[200]: NO Bad CLM handle. Reinitializing.
2020-03-02 08:07:04.655 PL-4 osafclmna[139]: NO
safNode=PL-4,safCluster=myClmCluster Joined cluster, nodeid=2040f
2020-03-02 08:07:04.709 PL-4 osafamfnd[161]: NO AVD NEW_ACTIVE, adest:1
2020-03-02 08:07:04.730 PL-4 osafamfnd[161]: NO saClmDispatch BAD_HANDLE
2020-03-02 08:07:04.733 PL-4 osafamfnd[161]: NO Synced
SISU:safSi=A,safApp=osaftest,safSu=2,safSg=1,safApp=osaftest <1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Synced
SU:safSu=2,safSg=1,safApp=osaftest <0, 1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Synced
SISU:safSi=NoRed3,safApp=OpenSAF,safSu=PL-4,safSg=NoRed,safApp=OpenSAF <1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Synced
SU:safSu=PL-4,safSg=NoRed,safApp=OpenSAF <0, 1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO 6 CSICOMP states sent
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO 6 COMP states sent
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Sending node up due to
NCSMDS_NEW_ACTIVE
2020-03-02 08:07:04.736 PL-4 osafckptnd[200]: NO CLM selection object was
updated. (15)
2020-03-02 08:07:18.078 PL-4 osafsmfnd[174]: NO MDS mds_svc_event:
NCSMDS_NEW_ACTIVE
2020-03-02 08:07:18.079 PL-4 osafsmfnd[174]: NO MDS mds_svc_event:
NCSMDS_SVC_ID_SMFD dest = 0xf
2020-03-02 08:07:34.741 PL-4 osafamfnd[161]: WA saImmOmInitialize failed. Use
previous value of nodeName.
2020-03-02 08:07:34.744 PL-4 osafamfnd[161]: NO Received reboot order, ordering
reboot now!
2020-03-02 08:07:34.744 PL-4 osafamfnd[161]: Rebooting OpenSAF NodeId = 132111
EE Name = , Reason: Received reboot order, OwnNodeId = 132111, SupervisionTime
= 60
2020-03-02 08:07:34.843 PL-4 osafimmnd[369]: mkfifo already exists:
/var/lib/opensaf/osafimmnd.fifo File exists
2020-03-02 08:07:34.846 PL-4 osafimmnd[369]: Started
2020-03-02 08:07:05.057 SC-1 osafamfd[235]: NO amfnd svc up not yet received
from node 2040f
...
2020-03-02 08:07:29.556 SC-1 osafamfd[235]: NO Cluster startup is done
2020-03-02 08:07:34.743 SC-1 osafamfd[235]: NO Received node_up from 2040f:
msg_id 1
2020-03-02 08:07:34.743 SC-1 osafamfd[235]: WA Sending node reboot order to
node:safAmfNode=PL-4,safAmfCluster=myAmfCluster, due to first node_up_msg after
node sync window
~~~
IMMNDs can expected restart due to new IMMND coord after split-brain on
different partition side during split-brain.
The problem is amfnd reinit CLM then stuck in CLM callback due to init IMM OM,
it cause AMFND delay restart IMMND and resend node_up.
Finally, AMFD order it reboot unexpectedly.
~~~
<143>1 2020-03-02T08:07:04.495525+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3444"] 161:amf/amfnd/clc.cc:3288 T1 CLC CLI
script:'/usr/local/lib/opensaf/clc-cli/osaf-immnd'
<143>1 2020-03-02T08:07:04.49553+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3445"] 161:amf/amfnd/clc.cc:3291 T1 CLC CLI command arguments[1]
='stop'
<141>1 2020-03-02T08:07:04.728127+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3493"] 161:amf/amfnd/main.cc:621 NO saClmDispatch BAD_HANDLE
<143>1 2020-03-02T08:07:04.728223+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3494"] 161:amf/amfnd/main.cc:660 >> avnd_evt_process
<143>1 2020-03-02T08:07:04.728227+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3495"] 349:amf/amfnd/clm.cc:313 >> avnd_clm_init_thread
<143>1 2020-03-02T08:07:04.73634+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3682"] 161:amf/amfnd/clm.cc:200 >> clm_track_cb: '0' '4' '1'
<143>1 2020-03-02T08:07:04.73636+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3687"] 161:amf/amfnd/clm.cc:118 >> clm_to_amf_node
<143>1 2020-03-02T08:07:04.736366+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3688"] 161:imm/agent/imma_om_api.cc:172 >> saImmOmInitialize
<143>1 2020-03-02T08:07:04.736371+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3689"] 161:imm/agent/imma_om_api.cc:198 TR OM client version A.2.15
<143>1 2020-03-02T08:07:04.736375+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3690"] 161:imm/agent/imma_om_api.cc:243 >> initialize_common
<143>1 2020-03-02T08:07:04.73638+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3691"] 161:imm/agent/imma_init.cc:263 >> imma_startup
<143>1 2020-03-02T08:07:04.736389+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3692"] 161:imm/agent/imma_init.cc:273 TR use count 0
<143>1 2020-03-02T08:07:04.736466+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3693"] 161:imm/agent/imma_init.cc:63 >> imma_sync_with_immnd
<143>1 2020-03-02T08:07:04.736472+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3694"] 161:imm/agent/imma_init.cc:71 TR Blocking first client
<143>1 2020-03-02T08:07:04.811047+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3697"] 167:amf/amfnd/clc.cc:630 >> comp_clc_resp_callback
<143>1 2020-03-02T08:07:04.811061+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3698"] 167:amf/amfnd/clc.cc:654 << comp_clc_resp_callback: 1
<143>1 2020-03-02T08:07:34.736998+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3703"] 161:imm/agent/imma_init.cc:81 TR Blocking wait released
<143>1 2020-03-02T08:07:34.737151+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3707"] 161:imm/agent/imma_om_api.cc:254 T2 ERR_TRY_AGAIN: IMMND is
DOWN
<143>1 2020-03-02T08:07:34.737475+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3725"] 161:imm/agent/imma_om_api.cc:229 << saImmOmInitialize
<140>1 2020-03-02T08:07:34.73753+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3726"] 161:amf/amfnd/clm.cc:127 WA saImmOmInitialize failed. Use
previous value of nodeName.
<143>1 2020-03-02T08:07:34.738644+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3784"] 161:amf/amfnd/clc.cc:340 >> avnd_evt_clc_resp_evh
<143>1 2020-03-02T08:07:34.738656+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3785"] 161:amf/amfnd/clc.cc:351 TR
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF', command
type:AVND_COMP_CLC_CMD_TYPE_CLEANUP(3)
<143>1 2020-03-02T08:07:34.738664+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3786"] 161:amf/amfnd/clc.cc:412 TR Comp
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Cleanup.
<143>1 2020-03-02T08:07:34.738671+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3787"] 161:amf/amfnd/clc.cc:776 >> avnd_comp_clc_fsm_run: Comp
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF', Ev '8'
<143>1 2020-03-02T08:07:34.738679+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3788"] 161:amf/amfnd/clc.cc:883 T1
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF':Entering CLC FSM:
presence state:'SA_AMF_PRESENCE_RESTARTING(5)',
Event:'AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_SUCC'
<143>1 2020-03-02T08:07:34.738687+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3789"] 161:amf/amfnd/clc.cc:1969 >>
avnd_comp_clc_xxxing_cleansucc_hdler:
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Cleanup success event in
the instantiating/restarting state
<143>1 2020-03-02T08:07:34.738695+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3790"] 161:amf/amfnd/compdb.cc:1714 >> avnd_comp_config_reinit:
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
<143>1 2020-03-02T08:07:34.738702+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3791"] 161:amf/amfnd/compdb.cc:1770 << avnd_comp_config_reinit: 0
<143>1 2020-03-02T08:07:34.738709+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3792"] 161:amf/amfnd/comp.cc:2636 >>
avnd_comp_cmplete_all_assignment: Comp
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
<143>1 2020-03-02T08:07:34.738716+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3793"] 161:amf/amfnd/comp.cc:2714 <<
avnd_comp_cmplete_all_assignment
<143>1 2020-03-02T08:07:34.738723+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3794"] 161:amf/amfnd/clc.cc:1987 TR inst_retry_cnt:0,
inst_retry_max:2
<143>1 2020-03-02T08:07:34.738731+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3795"] 161:amf/amfnd/clc.cc:3111 >> avnd_comp_clc_cmd_execute:
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF':CLC CLI command
type:'AVND_COMP_CLC_CMD_TYPE_INSTANTIATE(1)'
<143>1 2020-03-02T08:07:34.738743+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3796"] 161:amf/amfnd/clc.cc:3288 T1 CLC CLI
script:'/usr/local/lib/opensaf/clc-cli/osaf-immnd'
<143>1 2020-03-02T08:07:34.73875+01:00 PL-4 osafamfnd 161 osafamfnd [meta
sequenceId="3797"] 161:amf/amfnd/clc.cc:3291 T1 CLC CLI command arguments[1]
='instantiate'
~~~
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets