- Attachments has changed:
Diff:
~~~~
--- old
+++ new
@@ -1 +0,0 @@
-logs.7z (250.9 kB; application/octet-stream)
~~~~
---
** [tickets:#2029] imm: fevs message lost during failover**
**Status:** fixed
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen
**Last Updated:** Thu Sep 22, 2016 12:03 PM UTC
**Owner:** Hung Nguyen
There's fevs message loss when failing over between 2 SCs.
</br>
~~~
Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected.
Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE)
Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected.
Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B)
...
Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755,
2010f> (OsafImmPbeRt_B)
~~~
</br>
The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that
applier keeps being mark as dying
</br>
~~~
Sep 8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports
missing PbeBSlave locally => unsafe
Sep 8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports
missing PbeBSlave locally => unsafe
Sep 8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports
missing PbeBSlave locally => unsafe
...
Sep 8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports
missing PbeBSlave locally => unsafe
Sep 8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports
missing PbeBSlave locally => unsafe
Sep 8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports
missing PbeBSlave locally => unsafe
...
~~~
</br>
The main problem is the standby IMMD also broadcast D2ND_DISCARD_NODE message
when it receives an NCSMDS_DOWN from IMMND. See immd_process_immnd_down().
If the NCSMDS_DOWN event comes to the 2 IMMDs at the same time, the 2
D2ND_DISCARD_NODE messages will be stamped with the same number. One of the 2
will be discarded by IMMNDs, no problem here.
But if there's a latency of NCSMDS_DOWN event, an other fevs message (in this
case it's D2ND_DISCARD_IMPL for @OpenSafImmPBE) will be discarded by IMMNDs,
that will cause fevs message loss.
Details of the problem is explained here
</br>
http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjACgGUAVAIQEoAoUSWeEJNTLAJjwEEBhIy66ORFBnQA5LAGcKFMACMA9gA9ksgG4BTMI2z5i5ADRCW7LmQBcMAK5gwqhgDNVysQRsATDrPNITyHAAYALADMAGySMgpKahpComImMVhKCMgEHMzIUGLILrIA7ghhcooq6pq4hKSmwhwE2AQA+lgA8gDqwsgONhCSkgbalQC0AMTWLgB8CXEsoo2oqWwASlj1wk1YAKLIYhAgALbAqi7IuVAQABY+AYEA7L1MrJzcADxD0gA25qoDkyaizMtYOYcRbLDAABQAMshbLINAABJoHBAEEC2VC7XZgkjrQoRErRe5GbgmeyOZwINweBiZZAIWQoczAFwgCCHZAQWQAc1U51KJ3OZRwAB0ECKxLIMihtntgFlechdqoxGIQNzjqcLn4grcKAYHsZhqMJphYiZpgCgSD6uCodL9mz+Zqrjq+hVyC9OdYbN9CY9TOgSBwxMoFUqVWqYRotTdccUooK3aYwWAoAwPCh5blwAhU5yRSKWmxkOgw6rVMgYFSICZo9dkABqHzIACEAF5LtqeuE46UfkQzuXJtlMjBwEdzbN5ktrehIaHlWX8whpKpR+YxOXZLZsoy3rAWWzSVkEOZdiuwEv+yyAORZXJnACeyARSJRaIxWM2NLpKFs5jebxPi4I5jocsaRL2vrGL8NR1I0rTtJ0SCXmcNJISglaKnKEp6sgbwHho5z0IKQA
</br>
~~~
Sep 8 11:50:00 SC-2-1 osafimmd[4226]: WA IMMND DOWN on active controller 2
detected at standby immd!! 1. Possible failover
...
Sep 8 11:50:00 SC-2-1 osafimmd[4226]: WA Message count:10437 + 1 != 10437
Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: WA DISCARD DUPLICATE FEVS message:10437
Sep 8 11:50:00 SC-2-1 osafimmnd[4241]: WA Error code 2 returned for message
type 82 - ignoring
~~~
</br>
Attached is the logs
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets