---
** [tickets:#2510] amfd: payloads rebooted when recovering from SC absence **
**Status:** accepted
**Milestone:** 5.17.08
**Created:** Fri Jun 23, 2017 01:34 AM UTC by Gary Lee
**Last Updated:** Fri Jun 23, 2017 01:34 AM UTC
**Owner:** Gary Lee
When recovering from SC absence, sometimes all PLs are rebooted because node
ups are not being processed by amfd.
Normally, node ups froms PL should be received by AMFD every 1s during the sync
window. Eg. "NO Received node_up from 2040f: msg_id 1".
When the problem occurs, the node up messages are not logged by AMFD for
periods of up to 15-20 seconds. After applying the following patch, it can be
seen that the node ups are in fact received by AMFD, but not received by the
main thread.
```
default:
365 365 evt->rcv_evt = static_cast<AVD_EVT_TYPE>(
366 366 (rcv_msg->msg_type - AVSV_N2D_NODE_UP_MSG) +
AVD_EVT_NODE_UP_MSG);
367 367 break;
368 368 }
369 369
370 370 osafassert((AVD_EVT_INVALID < evt->rcv_evt) && (evt->rcv_evt
< AVD_EVT_MAX));
371 371
372 372 evt->info.avnd_msg = rcv_msg;
373 + if (evt->rcv_evt == AVD_EVT_NODE_UP_MSG) {
374 + LOG_NO("MDS Thread: Received node_up from %x: msg_id %u",
375 + evt->info.avnd_msg->msg_info.n2d_node_up.node_id,
376 + evt->info.avnd_msg->msg_info.n2d_node_up.msg_id);
377 + }
373 378
374 379 if (m_NCS_IPC_SEND(&cb->avd_mbx, evt, NCS_IPC_PRIORITY_HIGH)
!=
375 380 NCSCC_RC_SUCCESS) {
376 381 LOG_ER("%s: ncs_ipc_send failed", __FUNCTION__);
377 382 avsv_dnd_msg_free(rcv_msg);
378 383 evt->info.avnd_msg = nullptr;
379 384 delete evt;
380 385 TRACE_LEAVE();
381 386 return NCSCC_RC_FAILURE;
382 387 }
```
```
var/log/opensaf/osafamfd:Jun 20 17:20:32.498510 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:32.499659 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:32.516707 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.599087 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.599784 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.618200 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.699435 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.700674 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.720039 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.799851 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.801838 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.821058 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.900386 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.902192 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.922864 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.000570 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.002417 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.023179 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.101071 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.103714 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.124949 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.201609 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.204156 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.244475 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.301996 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.304880 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.345560 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.402541 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.405582 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.445846 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.502786 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.505630 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.546256 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.603419 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.607302 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.647594 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.703701 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.707718 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.748243 osafamfd
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:46.054654 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0296] >> avd_node_up_evh: from 2030f,
safAmfNode=PL-3,safAmfCluster=myAmfCluster
var/log/opensaf/osafamfd:Jun 20 17:20:46.054675 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2030f: msg_id
1
var/log/opensaf/osafamfd:Jun 20 17:20:46.054682 osafamfd
[10514:10514:src/amf/amfd/util.cc:0203] >> avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.054695 osafamfd
[10514:10514:src/amf/amfd/util.cc:0236] << avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.054734 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0439] WA Sending node reboot order to
node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after
cluster startup timeout
var/log/opensaf/osafamfd:Jun 20 17:20:46.054749 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0534] << avd_node_up_evh
var/log/opensaf/osafamfd:Jun 20 17:20:46.055356 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0296] >> avd_node_up_evh: from 2040f,
safAmfNode=PL-4,safAmfCluster=myAmfCluster
var/log/opensaf/osafamfd:Jun 20 17:20:46.055378 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2040f: msg_id
1
var/log/opensaf/osafamfd:Jun 20 17:20:46.055387 osafamfd
[10514:10514:src/amf/amfd/util.cc:0203] >> avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.055402 osafamfd
[10514:10514:src/amf/amfd/util.cc:0236] << avd_snd_node_up_msg
var/log/opensaf/osafamfd:Jun 20 17:20:46.055428 osafamfd
[10514:10514:src/amf/amfd/ndfsm.cc:0439] WA Sending node reboot order to
node:safAmfNode=PL-4,safAmfCluster=myAmfCluster, due to late node_up_msg after
cluster startup timeout
```
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets