- **status**: review --> fixed


---

** [tickets:#3281] mds: Sending wrong NO_ACTIVE after split brain detection**

**Status:** fixed
**Milestone:** 5.21.12
**Created:** Mon Sep 06, 2021 01:47 AM UTC by Hieu Hong Hoang
**Last Updated:** Mon Nov 01, 2021 03:04 AM UTC
**Owner:** Hieu Hong Hoang
**Attachments:**

- 
[dump_mds_5.21.06.patch](https://sourceforge.net/p/opensaf/tickets/3281/attachment/dump_mds_5.21.06.patch)
 (3.5 kB; application/octet-stream)
- 
[mds.log.1](https://sourceforge.net/p/opensaf/tickets/3281/attachment/mds.log.1)
 (12.0 MB; application/octet-stream)


Configuration:
- Cluster with 10 SCs, allow sc absence.
- Split cluster to four partitions:  [[SC-1, SC-2], [SC-3, SC-4, SC-5, SC-6], 
[SC-7, SC-8, SC-9], [SC-10]]
- Role of SCs after network splits:  [[SC-1(ACT), SC-2(STB)], [SC-3(ACT), SC-4, 
SC-5(STB), SC-6], [SC-7(STB), SC-8(ATC), SC-9], [SC-10(ATC)]]
- Network merges

Observation:
- All active and standby SCs rebooted due to split brain detected except SCs in 
partition 1. The SCs in partition 1 don't reboot because the active and standby 
SCs in other partitions rebooted too fast.  
- Ntf agent in SC1 fails to send notification to ntf server and it will not 
recover.

~~~
<143>1 2021-09-04T07:11:56.225221+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136348"] 439:amf/amfd/imm.cc:419 >> execute 
<143>1 2021-09-04T07:11:56.225223+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136349"] 439:amf/amfd/ntf.cc:804 >> exec: Ntf Type:3000, sent 
status:0
<143>1 2021-09-04T07:11:56.225227+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136350"] 439:amf/amfd/ntf.cc:491 >> avd_try_send_notification: Ntf 
Type:3000, sent status:0
<143>1 2021-09-04T07:11:56.225231+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136351"] 439:ntf/agent/ntfa_api.c:2016 >> saNtfNotificationSend 
<143>1 2021-09-04T07:11:56.225235+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136352"] 439:ntf/agent/ntfa_api.c:62 TR NTFS server is unavailable
<143>1 2021-09-04T07:11:56.225238+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136353"] 439:ntf/agent/ntfa_api.c:2260 << saNtfNotificationSend 
<143>1 2021-09-04T07:11:56.225241+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136354"] 439:amf/amfd/ntf.cc:513 TR Notification Send unsuccesful 
TRY_AGAIN or TIMEOUT rc:6
<143>1 2021-09-04T07:11:56.225243+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136355"] 439:amf/amfd/ntf.cc:532 << avd_try_send_notification 
<143>1 2021-09-04T07:11:56.225246+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136356"] 439:amf/amfd/ntf.cc:811 TR TRY-AGAIN
<143>1 2021-09-04T07:11:56.225249+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136357"] 439:amf/amfd/ntf.cc:822 << exec 
<143>1 2021-09-04T07:11:56.225252+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="136358"] 439:amf/amfd/imm.cc:427 << execute: 2
...
<143>1 2021-09-04T07:26:00.185418+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191040"] 439:amf/amfd/imm.cc:419 >> execute 
<143>1 2021-09-04T07:26:00.185465+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191041"] 439:amf/amfd/ntf.cc:804 >> exec: Ntf Type:3000, sent 
status:0
<143>1 2021-09-04T07:26:00.185475+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191042"] 439:amf/amfd/ntf.cc:491 >> avd_try_send_notification: Ntf 
Type:3000, sent status:0
<143>1 2021-09-04T07:26:00.185485+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191043"] 439:ntf/agent/ntfa_api.c:2016 >> saNtfNotificationSend 
<143>1 2021-09-04T07:26:00.185497+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191044"] 439:ntf/agent/ntfa_api.c:59 TR NTFS server is down
<143>1 2021-09-04T07:26:00.185505+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191045"] 439:ntf/agent/ntfa_api.c:2260 << saNtfNotificationSend 
<143>1 2021-09-04T07:26:00.185513+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191046"] 439:amf/amfd/ntf.cc:513 TR Notification Send unsuccesful 
TRY_AGAIN or TIMEOUT rc:6
<143>1 2021-09-04T07:26:00.18552+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191047"] 439:amf/amfd/ntf.cc:532 << avd_try_send_notification 
<143>1 2021-09-04T07:26:00.185528+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191048"] 439:amf/amfd/ntf.cc:811 TR TRY-AGAIN
<143>1 2021-09-04T07:26:00.185536+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191049"] 439:amf/amfd/ntf.cc:822 << exec 
<143>1 2021-09-04T07:26:00.185543+02:00 SC-1 osafamfd 439 osafamfd [meta 
sequenceId="191050"] 439:amf/amfd/imm.cc:427 << execute: 2
~~~

Reason:
- Before the active SC-3 rebooted, SC-1 still had enough time to connect to the 
ntf server in SC-3. When SC-3 rebooted, the ntf agent in SC-1 received a 
NO_ACTIVE message of NTFS service. Actually, the ntf server in SC-1 is still in 
active state.
- The following mds log is generated by the opensaf code which applied the 
patch dump_mds_5.21.06.patch.

+ The ntf agent in SC-1 detects the ntf server in SC-3 is up. Mds updates the 
active destination of NTFS service to SC-3.
~~~
<142>1 2021-09-04T07:10:34.384094+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307776"] MDTM: Processing pollin events
<142>1 2021-09-04T07:10:34.3841+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307777"] MDTM: Received SVC event
<142>1 2021-09-04T07:10:34.384102+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307778"] MDTM: No entry in ref tbl so dropping the recd event
<142>1 2021-09-04T07:10:34.384105+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307779"] MDTM: Processing pollin events
<142>1 2021-09-04T07:10:34.384111+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307780"] MDTM: Received SVC event
<143>1 2021-09-04T07:10:34.384113+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307781"] >> mds_mcm_svc_up
<143>1 2021-09-04T07:10:34.384115+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307782"] MCM:API: LOCAL SVC INFO  : svc_id = NTFA(29) | PWE id = 1 
| VDEST id = 65535 |
<143>1 2021-09-04T07:10:34.384117+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307783"] MCM:API: REMOTE SVC INFO : svc_id = NTFS(28) | PWE id = 1 
| VDEST id = 14 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2021-09-04T07:10:34.38412+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307784"] >> mds_svc_tbl_query
<143>1 2021-09-04T07:10:34.384122+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307785"] << mds_svc_tbl_query
<143>1 2021-09-04T07:10:34.384127+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307786"] >> mds_subtn_tbl_get_details
<143>1 2021-09-04T07:10:34.38413+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307787"] << mds_subtn_tbl_get_details
<143>1 2021-09-04T07:10:34.384132+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307788"] >> mds_mcm_validate_scope
<143>1 2021-09-04T07:10:34.384135+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307789"] << mds_mcm_validate_scope
<143>1 2021-09-04T07:10:34.384138+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307790"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2021-09-04T07:10:34.384141+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307791"] MDS:DB: Subscription Result not present
<143>1 2021-09-04T07:10:34.384144+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307792"] << mds_get_subtn_res_tbl_by_adest
<143>1 2021-09-04T07:10:34.384146+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307793"] >> mds_subtn_res_tbl_get
<143>1 2021-09-04T07:10:34.384149+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307794"] << mds_subtn_res_tbl_get
<143>1 2021-09-04T07:10:34.384152+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307795"] >> mds_subtn_res_tbl_add
<143>1 2021-09-04T07:10:34.384155+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307796"] >> dump_subt_res_tree
...
<143>1 2021-09-04T07:10:34.384267+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307831"] sub_svc_id 28 adest 0 vdest_id 14 next_active_adest 
2010f0000019d
<143>1 2021-09-04T07:10:34.384269+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307832"] sub_svc_id 28 adest 2020f0000018e vdest_id 14 
<143>1 2021-09-04T07:10:34.384272+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307833"] sub_svc_id 28 adest 2010f0000019d vdest_id 14 
...
<143>1 2021-09-04T07:10:34.384287+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307838"] << dump_subt_res_tree
<143>1 2021-09-04T07:10:34.38429+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307839"] MDS:DB: adest_details: <rem_node[0x2030f]:dest_pid[398]> 
<143>1 2021-09-04T07:10:34.384298+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307840"] << get_subtn_adest_details
<143>1 2021-09-04T07:10:34.384301+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307841"] MDS:DB: sub_adest_details: 
<rem_node[0x2030f]:dest_pid[398]>
<143>1 2021-09-04T07:10:34.384304+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307842"] >> dump_subt_res_tree
... 
<143>1 2021-09-04T07:10:34.384462+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307877"] sub_svc_id 28 adest 0 vdest_id 14 next_active_adest 
2030f0000018e
<143>1 2021-09-04T07:10:34.384465+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307878"] sub_svc_id 28 adest 2020f0000018e vdest_id 14 
<143>1 2021-09-04T07:10:34.384468+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307879"] sub_svc_id 28 adest 2030f0000018e vdest_id 14 
<143>1 2021-09-04T07:10:34.384471+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307880"] sub_svc_id 28 adest 2010f0000019d vdest_id 14 
...
<143>1 2021-09-04T07:10:34.38449+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307885"] << dump_subt_res_tree
<143>1 2021-09-04T07:10:34.384493+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307886"] << mds_subtn_res_tbl_add
<143>1 2021-09-04T07:10:34.384496+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="307887"] << mds_mcm_svc_up
~~~
+ The ntf agent in SC-1 detected the ntf server in SC-3 was down.
~~~
<142>1 2021-09-04T07:10:34.396969+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308753"] MDTM: Processing pollin events
<142>1 2021-09-04T07:10:34.396975+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308754"] MDTM: Received SVC event
<143>1 2021-09-04T07:10:34.396979+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308755"] >> mds_mcm_svc_down
<143>1 2021-09-04T07:10:34.396983+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308756"] MCM:API: LOCAL SVC INFO  : svc_id = NTFA(29) | PWE id = 1 
| VDEST id = 65535 |
<143>1 2021-09-04T07:10:34.396986+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308757"] MCM:API: REMOTE SVC INFO : svc_id = NTFS(28) | PWE id = 1 
| VDEST id = 14 | POLICY = 2 | SCOPE = 4 | ROLE = 1 | MY_PCON = 0 |
<143>1 2021-09-04T07:10:34.396989+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308758"] >> mds_svc_tbl_query
<143>1 2021-09-04T07:10:34.396992+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308759"] << mds_svc_tbl_query
<143>1 2021-09-04T07:10:34.396995+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308760"] >> mds_subtn_tbl_get_details
<143>1 2021-09-04T07:10:34.396998+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308761"] << mds_subtn_tbl_get_details
<143>1 2021-09-04T07:10:34.397001+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308762"] >> mds_mcm_validate_scope
<143>1 2021-09-04T07:10:34.397003+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308763"] << mds_mcm_validate_scope
<143>1 2021-09-04T07:10:34.397006+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308764"] >> mds_get_subtn_res_tbl_by_adest
<143>1 2021-09-04T07:10:34.39701+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308765"] << mds_get_subtn_res_tbl_by_adest
<143>1 2021-09-04T07:10:34.397014+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308766"] >> mds_subtn_res_tbl_get_by_adest
<143>1 2021-09-04T07:10:34.397016+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308767"] << mds_subtn_res_tbl_get_by_adest
<143>1 2021-09-04T07:10:34.397019+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308768"] >> mds_subtn_res_tbl_get
<143>1 2021-09-04T07:10:34.397022+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308769"] << mds_subtn_res_tbl_get
<143>1 2021-09-04T07:10:34.397025+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308770"] >> mds_subtn_res_tbl_del
<143>1 2021-09-04T07:10:34.397029+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308771"] >> dump_subt_res_tree...
<143>1 2021-09-04T07:10:34.397157+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308805"] sub_svc_id 28 adest 0 vdest_id 14 next_active_adest 
2030f0000018e
<143>1 2021-09-04T07:10:34.397159+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308806"] sub_svc_id 28 adest 2020f0000018e vdest_id 14 
<143>1 2021-09-04T07:10:34.397162+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308807"] sub_svc_id 28 adest 2030f0000018e vdest_id 14 
<143>1 2021-09-04T07:10:34.397165+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308808"] sub_svc_id 28 adest 2010f0000019d vdest_id 14 
...
<143>1 2021-09-04T07:10:34.397186+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308813"] << dump_subt_res_tree
<143>1 2021-09-04T07:10:34.397189+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308814"] >> dump_subt_res_tree
...
<143>1 2021-09-04T07:10:34.397314+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308848"] sub_svc_id 28 adest 0 vdest_id 14 next_active_adest 
2030f0000018e
<143>1 2021-09-04T07:10:34.397321+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308849"] sub_svc_id 28 adest 2020f0000018e vdest_id 14 
<143>1 2021-09-04T07:10:34.397325+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308850"] sub_svc_id 28 adest 2010f0000019d vdest_id 14 
...
<143>1 2021-09-04T07:10:34.39735+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308855"] << dump_subt_res_tree
<143>1 2021-09-04T07:10:34.397353+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308856"] << mds_subtn_res_tbl_del
<143>1 2021-09-04T07:10:34.397356+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308857"] >> mds_subtn_res_tbl_getnext_any
<143>1 2021-09-04T07:10:34.397365+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308858"] << mds_subtn_res_tbl_getnext_any
~~~
+ Mds removed the active destination of NTFS service although the NTFS service 
in SC-1 was still active.
~~~
<143>1 2021-09-04T07:10:34.39737+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308859"] >> mds_subtn_res_tbl_remove_active
<143>1 2021-09-04T07:10:34.397373+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308860"] 
MCM_DB:RemoveActive:TimerStart:AwaitActiveTmri:Hdl=0xfc300003:SvcHdl=562945658454045:sbscr-svcid=NTFS(28),vdest=14
<143>1 2021-09-04T07:10:34.397377+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308861"] << mds_subtn_res_tbl_remove_active
<143>1 2021-09-04T07:10:34.397379+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308862"] >> mds_mcm_user_event_callback
<143>1 2021-09-04T07:10:34.397382+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308863"] >> mds_svc_tbl_get
<143>1 2021-09-04T07:10:34.397385+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308864"] << mds_svc_tbl_get
<143>1 2021-09-04T07:10:34.397387+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308865"] >> mds_subtn_tbl_get
<143>1 2021-09-04T07:10:34.397395+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308866"] << mds_subtn_tbl_get
<143>1 2021-09-04T07:10:34.397423+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308867"] << mds_mcm_user_event_callback
<142>1 2021-09-04T07:10:34.397425+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308868"] MCM:API: svc_down : svc_id = NTFA(29) on DEST id = 65535 
got NO_ACTIVE for svc_id = NTFS(28) on Vdest id = 14 Adest = 
<rem_node[0x2030f]:dest_pid[398]>, rem_svc_pvt_ver=1
<143>1 2021-09-04T07:10:34.397429+02:00 SC-1 osafamfd 439 mds.log [meta 
sequenceId="308869"] << mds_mcm_svc_down
~~~
 



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to