- **status**: assigned --> review


---

** [tickets:#3162] amfnd: unexpected reboot after headless due to split-brain**

**Status:** review
**Milestone:** 5.20.05
**Created:** Tue Mar 03, 2020 05:35 AM UTC by Thuan Tran
**Last Updated:** Tue Mar 03, 2020 05:35 AM UTC
**Owner:** Thuan Tran


After split-brain (headless enable), node get unexpected reboot order.
~~~
2020-03-02 08:07:04.470 PL-4 osafimmnd[150]: ER Sync-verify: Established node 
has different Continuation count (50) than expected (51) - aborting
2020-03-02 08:07:04.497 PL-4 osafamfnd[161]: NO 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 60000000000 ns)
2020-03-02 08:07:04.497 PL-4 osafamfnd[161]: NO Restarting a component of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
2020-03-02 08:07:04.497 PL-4 osafamfnd[161]: NO 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
2020-03-02 08:07:04.630 PL-4 osafckptnd[200]: NO Bad CLM handle. Reinitializing.
2020-03-02 08:07:04.655 PL-4 osafclmna[139]: NO 
safNode=PL-4,safCluster=myClmCluster Joined cluster, nodeid=2040f
2020-03-02 08:07:04.709 PL-4 osafamfnd[161]: NO AVD NEW_ACTIVE, adest:1
2020-03-02 08:07:04.730 PL-4 osafamfnd[161]: NO saClmDispatch BAD_HANDLE
2020-03-02 08:07:04.733 PL-4 osafamfnd[161]: NO Synced 
SISU:safSi=A,safApp=osaftest,safSu=2,safSg=1,safApp=osaftest <1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Synced 
SU:safSu=2,safSg=1,safApp=osaftest <0, 1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Synced 
SISU:safSi=NoRed3,safApp=OpenSAF,safSu=PL-4,safSg=NoRed,safApp=OpenSAF <1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Synced 
SU:safSu=PL-4,safSg=NoRed,safApp=OpenSAF <0, 1, 3>
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO 6 CSICOMP states sent
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO 6 COMP states sent
2020-03-02 08:07:04.734 PL-4 osafamfnd[161]: NO Sending node up due to 
NCSMDS_NEW_ACTIVE
2020-03-02 08:07:04.736 PL-4 osafckptnd[200]: NO CLM selection object was 
updated. (15)
2020-03-02 08:07:18.078 PL-4 osafsmfnd[174]: NO MDS mds_svc_event: 
NCSMDS_NEW_ACTIVE
2020-03-02 08:07:18.079 PL-4 osafsmfnd[174]: NO MDS mds_svc_event: 
NCSMDS_SVC_ID_SMFD dest = 0xf
2020-03-02 08:07:34.741 PL-4 osafamfnd[161]: WA saImmOmInitialize failed. Use 
previous value of nodeName.
2020-03-02 08:07:34.744 PL-4 osafamfnd[161]: NO Received reboot order, ordering 
reboot now!
2020-03-02 08:07:34.744 PL-4 osafamfnd[161]: Rebooting OpenSAF NodeId = 132111 
EE Name = , Reason: Received reboot order, OwnNodeId = 132111, SupervisionTime 
= 60
2020-03-02 08:07:34.843 PL-4 osafimmnd[369]: mkfifo already exists: 
/var/lib/opensaf/osafimmnd.fifo File exists
2020-03-02 08:07:34.846 PL-4 osafimmnd[369]: Started

2020-03-02 08:07:05.057 SC-1 osafamfd[235]: NO amfnd svc up not yet received 
from node 2040f
...
2020-03-02 08:07:29.556 SC-1 osafamfd[235]: NO Cluster startup is done
2020-03-02 08:07:34.743 SC-1 osafamfd[235]: NO Received node_up from 2040f: 
msg_id 1
2020-03-02 08:07:34.743 SC-1 osafamfd[235]: WA Sending node reboot order to 
node:safAmfNode=PL-4,safAmfCluster=myAmfCluster, due to first node_up_msg after 
node sync window
~~~
IMMNDs can expected restart due to new IMMND coord after split-brain on 
different partition side during split-brain.
The problem is amfnd reinit CLM then stuck in CLM callback due to init IMM OM, 
it cause AMFND delay restart IMMND and resend node_up.
Finally, AMFD order it reboot unexpectedly.
~~~
<143>1 2020-03-02T08:07:04.495525+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3444"] 161:amf/amfnd/clc.cc:3288 T1 CLC CLI 
script:'/usr/local/lib/opensaf/clc-cli/osaf-immnd'
<143>1 2020-03-02T08:07:04.49553+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3445"] 161:amf/amfnd/clc.cc:3291 T1 CLC CLI command arguments[1] 
='stop'

<141>1 2020-03-02T08:07:04.728127+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3493"] 161:amf/amfnd/main.cc:621 NO saClmDispatch BAD_HANDLE
<143>1 2020-03-02T08:07:04.728223+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3494"] 161:amf/amfnd/main.cc:660 >> avnd_evt_process 
<143>1 2020-03-02T08:07:04.728227+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3495"] 349:amf/amfnd/clm.cc:313 >> avnd_clm_init_thread 

<143>1 2020-03-02T08:07:04.73634+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3682"] 161:amf/amfnd/clm.cc:200 >> clm_track_cb: '0' '4' '1'
<143>1 2020-03-02T08:07:04.73636+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3687"] 161:amf/amfnd/clm.cc:118 >> clm_to_amf_node 
<143>1 2020-03-02T08:07:04.736366+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3688"] 161:imm/agent/imma_om_api.cc:172 >> saImmOmInitialize 
<143>1 2020-03-02T08:07:04.736371+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3689"] 161:imm/agent/imma_om_api.cc:198 TR OM client version A.2.15
<143>1 2020-03-02T08:07:04.736375+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3690"] 161:imm/agent/imma_om_api.cc:243 >> initialize_common 
<143>1 2020-03-02T08:07:04.73638+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3691"] 161:imm/agent/imma_init.cc:263 >> imma_startup 
<143>1 2020-03-02T08:07:04.736389+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3692"] 161:imm/agent/imma_init.cc:273 TR use count 0
<143>1 2020-03-02T08:07:04.736466+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3693"] 161:imm/agent/imma_init.cc:63 >> imma_sync_with_immnd 
<143>1 2020-03-02T08:07:04.736472+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3694"] 161:imm/agent/imma_init.cc:71 TR Blocking first client

<143>1 2020-03-02T08:07:04.811047+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3697"] 167:amf/amfnd/clc.cc:630 >> comp_clc_resp_callback 
<143>1 2020-03-02T08:07:04.811061+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3698"] 167:amf/amfnd/clc.cc:654 << comp_clc_resp_callback: 1

<143>1 2020-03-02T08:07:34.736998+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3703"] 161:imm/agent/imma_init.cc:81 TR Blocking wait released
<143>1 2020-03-02T08:07:34.737151+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3707"] 161:imm/agent/imma_om_api.cc:254 T2 ERR_TRY_AGAIN: IMMND is 
DOWN

<143>1 2020-03-02T08:07:34.737475+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3725"] 161:imm/agent/imma_om_api.cc:229 << saImmOmInitialize 
<140>1 2020-03-02T08:07:34.73753+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3726"] 161:amf/amfnd/clm.cc:127 WA saImmOmInitialize failed. Use 
previous value of nodeName.

<143>1 2020-03-02T08:07:34.738644+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3784"] 161:amf/amfnd/clc.cc:340 >> avnd_evt_clc_resp_evh 
<143>1 2020-03-02T08:07:34.738656+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3785"] 161:amf/amfnd/clc.cc:351 TR 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF', command 
type:AVND_COMP_CLC_CMD_TYPE_CLEANUP(3)
<143>1 2020-03-02T08:07:34.738664+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3786"] 161:amf/amfnd/clc.cc:412 TR Comp 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Cleanup.
<143>1 2020-03-02T08:07:34.738671+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3787"] 161:amf/amfnd/clc.cc:776 >> avnd_comp_clc_fsm_run: Comp 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF', Ev '8'
<143>1 2020-03-02T08:07:34.738679+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3788"] 161:amf/amfnd/clc.cc:883 T1 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF':Entering CLC FSM: 
presence state:'SA_AMF_PRESENCE_RESTARTING(5)', 
Event:'AVND_COMP_CLC_PRES_FSM_EV_CLEANUP_SUCC'
<143>1 2020-03-02T08:07:34.738687+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3789"] 161:amf/amfnd/clc.cc:1969 >> 
avnd_comp_clc_xxxing_cleansucc_hdler: 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Cleanup success event in 
the instantiating/restarting state
<143>1 2020-03-02T08:07:34.738695+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3790"] 161:amf/amfnd/compdb.cc:1714 >> avnd_comp_config_reinit: 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
<143>1 2020-03-02T08:07:34.738702+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3791"] 161:amf/amfnd/compdb.cc:1770 << avnd_comp_config_reinit: 0
<143>1 2020-03-02T08:07:34.738709+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3792"] 161:amf/amfnd/comp.cc:2636 >> 
avnd_comp_cmplete_all_assignment: Comp 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
<143>1 2020-03-02T08:07:34.738716+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3793"] 161:amf/amfnd/comp.cc:2714 << 
avnd_comp_cmplete_all_assignment 
<143>1 2020-03-02T08:07:34.738723+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3794"] 161:amf/amfnd/clc.cc:1987 TR inst_retry_cnt:0, 
inst_retry_max:2
<143>1 2020-03-02T08:07:34.738731+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3795"] 161:amf/amfnd/clc.cc:3111 >> avnd_comp_clc_cmd_execute: 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF':CLC CLI command 
type:'AVND_COMP_CLC_CMD_TYPE_INSTANTIATE(1)'
<143>1 2020-03-02T08:07:34.738743+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3796"] 161:amf/amfnd/clc.cc:3288 T1 CLC CLI 
script:'/usr/local/lib/opensaf/clc-cli/osaf-immnd'
<143>1 2020-03-02T08:07:34.73875+01:00 PL-4 osafamfnd 161 osafamfnd [meta 
sequenceId="3797"] 161:amf/amfnd/clc.cc:3291 T1 CLC CLI command arguments[1] 
='instantiate'
~~~


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to