- **status**: accepted --> review
---
** [tickets:#2510] amfd: payloads rebooted when recovering from SC absence **
**Status:** review
**Milestone:** 5.17.08
**Created:** Fri Jun 23, 2017 01:34 AM UTC by Gary Lee
**Last Updated:** Fri Jun 23, 2017 01:35 AM UTC
**Owner:** Gary Lee
When recovering from SC absence, sometimes all PLs are rebooted because node
ups are not being processed by amfd in a timely manner.
Normally, node ups from PLs should be received by AMFD every 1s during the sync
window. Eg. "NO Received node_up from 2040f: msg_id 1".
When the problem occurs, the node up messages are not logged by AMFD for
periods of up to 15-20 seconds. After applying the following patch, it can be
seen that the node ups are in fact received by AMFD in the MDS thread, but not
received by the main thread.
```
default:
365 365 evt->rcv_evt = static_cast<AVD_EVT_TYPE>(
366 366 (rcv_msg->msg_type - AVSV_N2D_NODE_UP_MSG) +
AVD_EVT_NODE_UP_MSG);
367 367 break;
368 368 }
369 369
370 370 osafassert((AVD_EVT_INVALID < evt->rcv_evt) && (evt->rcv_evt
< AVD_EVT_MAX));
371 371
372 372 evt->info.avnd_msg = rcv_msg;
373 + if (evt->rcv_evt == AVD_EVT_NODE_UP_MSG) {
374 + LOG_NO("MDS Thread: Received node_up from %x: msg_id %u",
375 + evt->info.avnd_msg->msg_info.n2d_node_up.node_id,
376 + evt->info.avnd_msg->msg_info.n2d_node_up.msg_id);
377 + }
373 378
374 379 if (m_NCS_IPC_SEND(&cb->avd_mbx, evt, NCS_IPC_PRIORITY_HIGH)
!=
375 380 NCSCC_RC_SUCCESS) {
376 381 LOG_ER("%s: ncs_ipc_send failed", __FUNCTION__);
377 382 avsv_dnd_msg_free(rcv_msg);
378 383 evt->info.avnd_msg = nullptr;
379 384 delete evt;
380 385 TRACE_LEAVE();
381 386 return NCSCC_RC_FAILURE;
382 387 }
```
```
var/log/opensaf/osafamfd:Jun 20 17:20:32.498510 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:32.499659 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:32.516707 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.599087 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.599784 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.618200 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.699435 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.700674 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.720039 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.799851 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.801838 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.821058 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.900386 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.902192 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.922864 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.000570 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.002417 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.023179 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.101071 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.103714 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.124949 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.201609 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.204156 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.244475 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.301996 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.304880 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.345560 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.402541 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.405582 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.445846 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.502786 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.505630 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.546256 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.603419 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.607302 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.647594 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.703701 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.707718 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.748243 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:46.054654 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0296] >> avd_node_up_evh: from 2030f,
safAmfNode=PL-3,safAmfCluster=myAmfCluster
var/log/opensaf/osafamfd:Jun 20 17:20:46.054675 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2030f: msg_id
1
var/log/opensaf/osafamfd:Jun 20 17:20:46.054682 osafamfd
[10514:10514:src/amf/amfd/util.cc:0203] >> avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.054695 osafamfd
[10514:10514:src/amf/amfd/util.cc:0236] << avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.054734 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0439] WA Sending node reboot order to
node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after
cluster startup timeout
var/log/opensaf/osafamfd:Jun 20 17:20:46.054749 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0534] << avd_node_up_evh
var/log/opensaf/osafamfd:Jun 20 17:20:46.055356 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0296] >> avd_node_up_evh: from 2040f,
safAmfNode=PL-4,safAmfCluster=myAmfCluster
var/log/opensaf/osafamfd:Jun 20 17:20:46.055378 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2040f: msg_id
1
var/log/opensaf/osafamfd:Jun 20 17:20:46.055387 osafamfd
[10514:10514:src/amf/amfd/util.cc:0203] >> avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.055402 osafamfd
[10514:10514:src/amf/amfd/util.cc:0236] << avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.055428 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0439] WA Sending node reboot order to
node:safAmfNode=PL-4,safAmfCluster=myAmfCluster, due to late node_up_msg after
cluster startup timeout
```
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets