develop:
commit 1f788477b211cc565b5a432ce3a173c83d22e2b4
Author: Gary Lee <[email protected]>
Date: Mon Oct 29 06:13:19 2018 +0000
amfd: ensure node_sync_window_closed is set [#2946]
If all nodes are synced after headless, the timer is stopped
but node_sync_window_closed is never set to true.
Later on, if a node becomes split from the main network and
rejoins, it will send a headless sync to amfd.
amfd will go into a never ending loop of processing the message,
putting back into the queue, etc.
When the node sync timer is stopped, ensure node_sync_window_closed
is set.
Also modify avd_count_node_up() to count standby SC.
Sometimes a node_up from the standby SC arrives before mds up,
and the standby SC is incorrectly included in the node sync
count. Then a legitimate node_up from a PL is not accepted
because node_sync_window_closed is prematurely set.
---
** [tickets:#2946] amfd: set node_sync_window_closed when timer is stopped**
**Status:** review
**Milestone:** 5.18.12
**Created:** Thu Oct 25, 2018 10:36 AM UTC by Gary Lee
**Last Updated:** Mon Oct 29, 2018 06:53 AM UTC
**Owner:** Gary Lee
When testing split network partitions, sometimes AMFD gets into a loop
processing messages below. Eventually it is aborted by the watchdog.
2018-10-25 16:41:43.051 SC-1 osafamfd[272]: NO Receive message with event
type:13, msg_type:32, from node:2030f, msg_id:0
2018-10-25 16:41:43.052 SC-1 osafamfd[272]: NO Receive message with event
type:12, msg_type:31, from node:2030f, msg_id:0
2018-10-25 16:41:43.052 SC-1 osafamfd[272]: NO Receive message with event
type:13, msg_type:32, from node:2030f, msg_id:0
2018-10-25 16:41:43.052 SC-1 osafamfd[272]: NO Receive message with event
type:12, msg_type:31, from node:2030f, msg_id:0
2018-10-25 16:41:43.052 SC-1 osafamfd[272]: NO Receive message with event
type:13, msg_type:32, from node:2030f, msg_id:0
2018-10-25 16:41:43.052 SC-1 osafamfd[272]: NO Receive message with event
type:12, msg_type:31, from node:2030f, msg_id:0
2018-10-25 16:41:43.053 SC-1 osafamfd[272]: NO Receive message with event
type:13, msg_type:32, from node:2030f, msg_id:0
2018-10-25 16:41:43.053 SC-1 osafamfd[272]: NO Receive message with event
type:12, msg_type:31, from node:2030f, msg_id:0
2018-10-25 16:41:43.053 SC-1 osafamfd[272]: NO Receive message with event
type:13, msg_type:32, from node:2030f, msg_id:0
2018-10-25 16:41:43.053 SC-1 osafamfd[272]: NO Receive message with event
type:12, msg_type:31, from node:2030f, msg_id:0
2018-10-25 16:41:43.054 SC-1 osafamfd[272]: NO Receive message with event
type:13, msg_type:32, from node:2030f, msg_id:0
This fixes it. AMF didn't expect any more sync messages after all nodes have
synced. But a PL split from the main network partition will also send a
headless sync msg.
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index c460d8f..1bc6ed9 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -357,6 +357,7 @@ void avd_node_up_evh(AVD_CL_CB *cb, AVD_EVT *evt) {
if (cb->node_sync_tmr.is_active) {
avd_stop_tmr(cb, &cb->node_sync_tmr);
TRACE("stop NodeSync timer");
+ cb->node_sync_window_closed = true;
}
cb->all_nodes_synced = true;
LOG_NO("Received node_up_msg from all nodes");
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets