If a PL rejoins the main network partition before the node failover timer expires, it is told to reboot by AMFD. AMFND thinks it has become headless and resets rcv_msg_id to 0, and shows this when it receives the reboot msg from AMFD:
Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Message ID mismatch, rec xx, expected 1, OwnNodeId = xx, SupervisionTime = 60 We can avoid this by resetting snd_msg_id for this PL in AMFD in state LostFound, before the reboot msg is sent. --- src/amf/amfd/node_state.cc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/amf/amfd/node_state.cc b/src/amf/amfd/node_state.cc index a8659dcf7..787ddab94 100644 --- a/src/amf/amfd/node_state.cc +++ b/src/amf/amfd/node_state.cc @@ -126,6 +126,11 @@ void LostFound::TimerExpired() { node->node_name.c_str()); if (fsm_->Active() == true) { + // amfnd thinks it's been headless and resets its rcv_msg_id to 0, + // also do the same here to avoid 'Message ID mismatch' errors + // at amfnd + node->snd_msg_id = 0; + LOG_WA("Sending node reboot order"); avd_d2n_reboot_snd(node); -- 2.17.1 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel