As can be seen, there appear to be dropped messages.
The dropped messages are sent from the IMMD to the IMMMNDs as MDS broadcast.
So in some sense this can be seen as an overload symptom.
The main question is if the new MDS broadcast using TIPC broadcast has lower
throughput than the older multicast implementation?
Relevant is also how the test is constructed.
Are you pushing requests with as high frequency as accepted?
The fevs mechanism has a flow control mechanism that kicks when there
are too many non responded fevs messages sent. Perhaps the very fact that
TIPC broadcast is faster allows the test to push more messages per unit
time over fevs, but increase the risk for local tipc buffer overflow at
some nodes ?
---
** [tickets:#1036] mds: IMMND restarts because of out of order messages**
**Status:** unassigned
**Milestone:** 4.5.0
**Created:** Tue Sep 02, 2014 01:06 PM UTC by Neelakanta Reddy
**Last Updated:** Tue Sep 02, 2014 01:16 PM UTC
**Owner:** nobody
Sep 2 05:16:57 SLES-SLOT-2 osafimmnd[21492]: WA MESSAGE:81414 OUT OF ORDER my
highest processed:81412, exiting
Recreation steps:
1. The problem is reproduced when 100 swithovers are done
2. Immediately when failover is done
Then outof order message is observed.
3. Because of out-of order message, new-active IMMND went for re-start.
>From there on :
Sep 2 05:17:04 SLES-SLOT-2 osafimmnd[10230]: WA Sync MESSAGE:81639 OUT OF
ORDER my highest processed:81637
Sep 2 05:17:09 SLES-SLOT-2 osafimmnd[10254]: WA Sync MESSAGE:81893 OUT OF
ORDER my highest processed:81891
Sep 2 05:17:16 SLES-SLOT-2 osafimmnd[10275]: WA Sync MESSAGE:82114 OUT OF
ORDER my highest processed:82112
Sep 2 05:17:20 SLES-SLOT-2 osafimmnd[10295]: WA Sync MESSAGE:82335 OUT OF
ORDER my highest processed:82333
4. Because of constant IMMND restarts at the time of sync,CLM got TRY_AGAIN and
node went for reboot
Sep 2 05:17:16 SLES-SLOT-2 osafimmd[10478]: NO Node 2020f request sync
sync-pid:10295 epoch:0
Sep 2 05:17:17 SLES-SLOT-2 osafclmd[10521]: ER saImmOiInitialize_2 failed 6,
exiting
Sep 2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Sep 2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Sep 2 05:17:17 SLES-SLOT-2 osafamfnd[10550]: Rebooting OpenSAF NodeId = 131599
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131599, SupervisionTime = 60
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets