- **Milestone**: 5.0.1 --> 4.7.2


---

** [tickets:#2029] imm: fevs message lost during failover**

**Status:** review
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen
**Last Updated:** Tue Sep 20, 2016 05:49 PM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2029/attachment/logs.7z) 
(250.9 kB; application/octet-stream)


There's fevs message loss when failing over between 2 SCs.

</br>
~~~
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE)
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B)
...
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755, 
2010f> (OsafImmPbeRt_B)
~~~
</br>

The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that 
applier keeps being mark as dying

</br>
~~~
Sep  8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
Sep  8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
~~~
</br>
The main problem is the standby IMMD also broadcast D2ND_DISCARD_NODE message 
when it receives an NCSMDS_DOWN from IMMND. See immd_process_immnd_down().

If the NCSMDS_DOWN event comes to the 2 IMMDs at the same time, the 2 
D2ND_DISCARD_NODE messages will be stamped with the same number. One of the 2 
will be discarded by IMMNDs, no problem here.
But if there's a latency of NCSMDS_DOWN event, an other fevs message (in this 
case it's D2ND_DISCARD_IMPL for @OpenSafImmPBE) will be discarded by IMMNDs, 
that will cause fevs message loss.

Details of the problem is explained here
</br>

http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjACgGUAVAIQEoAoUSWeEJNTLAJjwEEBhIy66ORFBnQA5LAGcKFMACMA9gA9ksgG4BTMI2z5i5ADRCW7LmQBcMAK5gwqhgDNVysQRsATDrPNITyHAAYALADMAGySMgpKahpComImMVhKCMgEHMzIUGLILrIA7ghhcooq6pq4hKSmwhwE2AQA+lgA8gDqwsgONhCSkgbalQC0AMTWLgB8CXEsoo2oqWwASlj1wk1YAKLIYhAgALbAqi7IuVAQABY+AYEA7L1MrJzcADxD0gA25qoDkyaizMtYOYcRbLDAABQAMshbLINAABJoHBAEEC2VC7XZgkjrQoRErRe5GbgmeyOZwINweBiZZAIWQoczAFwgCCHZAQWQAc1U51KJ3OZRwAB0ECKxLIMihtntgFlechdqoxGIQNzjqcLn4grcKAYHsZhqMJphYiZpgCgSD6uCodL9mz+Zqrjq+hVyC9OdYbN9CY9TOgSBwxMoFUqVWqYRotTdccUooK3aYwWAoAwPCh5blwAhU5yRSKWmxkOgw6rVMgYFSICZo9dkABqHzIACEAF5LtqeuE46UfkQzuXJtlMjBwEdzbN5ktrehIaHlWX8whpKpR+YxOXZLZsoy3rAWWzSVkEOZdiuwEv+yyAORZXJnACeyARSJRaIxWM2NLpKFs5jebxPi4I5jocsaRL2vrGL8NR1I0rTtJ0SCXmcNJISglaKnKEp6sgbwHho5z0IKQA

</br>
~~~
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA IMMND DOWN on active controller 2 
detected at standby immd!! 1. Possible failover
...
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA Message count:10437 + 1 != 10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA DISCARD DUPLICATE FEVS message:10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA Error code 2 returned for message 
type 82 - ignoring
~~~
</br>

Attached is the logs


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to