- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -1,8 +1,8 @@
-When recovering from SC absence, sometimes all PLs are rebooted because node 
ups are not being processed by amfd.
+When recovering from SC absence, sometimes all PLs are rebooted because node 
ups are not being processed by amfd in a timely manner.
 
-Normally, node ups froms PL should be received by AMFD every 1s during the 
sync window. Eg. "NO Received node_up from 2040f: msg_id 1".
+Normally, node ups from PLs should be received by AMFD every 1s during the 
sync window. Eg. "NO Received node_up from 2040f: msg_id 1".
 
-When the problem occurs, the node up messages are not logged by AMFD for 
periods of up to 15-20 seconds. After applying the following patch, it can be 
seen that the node ups are in fact received by AMFD, but not received by the 
main thread.
+When the problem occurs, the node up messages are not logged by AMFD for 
periods of up to 15-20 seconds. After applying the following patch, it can be 
seen that the node ups are in fact received by AMFD in the MDS thread, but not 
received by the main thread.
 
 ```
      default:

~~~~




---

** [tickets:#2510] amfd: payloads rebooted when recovering from SC absence **

**Status:** accepted
**Milestone:** 5.17.08
**Created:** Fri Jun 23, 2017 01:34 AM UTC by Gary Lee
**Last Updated:** Fri Jun 23, 2017 01:34 AM UTC
**Owner:** Gary Lee


When recovering from SC absence, sometimes all PLs are rebooted because node 
ups are not being processed by amfd in a timely manner.

Normally, node ups from PLs should be received by AMFD every 1s during the sync 
window. Eg. "NO Received node_up from 2040f: msg_id 1".

When the problem occurs, the node up messages are not logged by AMFD for 
periods of up to 15-20 seconds. After applying the following patch, it can be 
seen that the node ups are in fact received by AMFD in the MDS thread, but not 
received by the main thread.

```
     default:
365     365            evt->rcv_evt = static_cast<AVD_EVT_TYPE>(
366     366                (rcv_msg->msg_type - AVSV_N2D_NODE_UP_MSG) + 
AVD_EVT_NODE_UP_MSG);
367     367            break;
368     368        }
369     369      
370     370        osafassert((AVD_EVT_INVALID < evt->rcv_evt) && (evt->rcv_evt 
< AVD_EVT_MAX));
371     371      
372     372        evt->info.avnd_msg = rcv_msg;
373     +  if (evt->rcv_evt == AVD_EVT_NODE_UP_MSG) {
374     +    LOG_NO("MDS Thread: Received node_up from %x: msg_id %u",
375     +           evt->info.avnd_msg->msg_info.n2d_node_up.node_id,
376     +           evt->info.avnd_msg->msg_info.n2d_node_up.msg_id);
377     +  }
373     378      
374     379        if (m_NCS_IPC_SEND(&cb->avd_mbx, evt, NCS_IPC_PRIORITY_HIGH) 
!=
375     380            NCSCC_RC_SUCCESS) {
376     381          LOG_ER("%s: ncs_ipc_send failed", __FUNCTION__);
377     382          avsv_dnd_msg_free(rcv_msg);
378     383          evt->info.avnd_msg = nullptr;
379     384          delete evt;
380     385          TRACE_LEAVE();
381     386          return NCSCC_RC_FAILURE;
382     387        }
```

```
var/log/opensaf/osafamfd:Jun 20 17:20:32.498510 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:32.499659 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:32.516707 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.599087 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.599784 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:33.618200 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.699435 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.700674 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:34.720039 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.799851 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.801838 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:35.821058 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.900386 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.902192 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:36.922864 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.000570 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.002417 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:38.023179 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.101071 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.103714 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:39.124949 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.201609 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.204156 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:40.244475 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.301996 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.304880 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:41.345560 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.402541 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.405582 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:42.445846 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.502786 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.505630 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:43.546256 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.603419 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.607302 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:44.647594 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.703701 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2020f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.707718 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2030f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:45.748243 osafamfd 
[10514:10517:src/amf/amfd/ndmsg.cc:0376] NO MDS Thread: Received node_up from 
2040f: msg_id 1
var/log/opensaf/osafamfd:Jun 20 17:20:46.054654 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0296] >> avd_node_up_evh: from 2030f, 
safAmfNode=PL-3,safAmfCluster=myAmfCluster
var/log/opensaf/osafamfd:Jun 20 17:20:46.054675 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2030f: msg_id 
1
var/log/opensaf/osafamfd:Jun 20 17:20:46.054682 osafamfd 
[10514:10514:src/amf/amfd/util.cc:0203] >> avd_snd_node_up_msg 
var/log/opensaf/osafamfd:Jun 20 17:20:46.054695 osafamfd 
[10514:10514:src/amf/amfd/util.cc:0236] << avd_snd_node_up_msg 
var/log/opensaf/osafamfd:Jun 20 17:20:46.054734 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0439] WA Sending node reboot order to 
node:safAmfNode=PL-3,safAmfCluster=myAmfCluster, due to late node_up_msg after 
cluster startup timeout
var/log/opensaf/osafamfd:Jun 20 17:20:46.054749 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0534] << avd_node_up_evh 
var/log/opensaf/osafamfd:Jun 20 17:20:46.055356 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0296] >> avd_node_up_evh: from 2040f, 
safAmfNode=PL-4,safAmfCluster=myAmfCluster
var/log/opensaf/osafamfd:Jun 20 17:20:46.055378 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0254] NO Received node_up from 2040f: msg_id 
1
var/log/opensaf/osafamfd:Jun 20 17:20:46.055387 osafamfd 
[10514:10514:src/amf/amfd/util.cc:0203] >> avd_snd_node_up_msg 
var/log/opensaf/osafamfd:Jun 20 17:20:46.055402 osafamfd 
[10514:10514:src/amf/amfd/util.cc:0236] << avd_snd_node_up_msg 
var/log/opensaf/osafamfd:Jun 20 17:20:46.055428 osafamfd 
[10514:10514:src/amf/amfd/ndfsm.cc:0439] WA Sending node reboot order to 
node:safAmfNode=PL-4,safAmfCluster=myAmfCluster, due to late node_up_msg after 
cluster startup timeout
```


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to