Well a hint is that you managed to bypass the problem (temporarily) by
increasing a queue size.
The error:
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Is very rarely seen, but can happen due to the latency of fevs turn arround
being lower than the rate of generated trafic.
So the question for you is simply why this happens in our setup and with your
traffic.
or if there is anything else unusual with your setup or traffic.
If the *only* imm traffic is sync traffic then it is really strange.
Again, this is a rare problem (in fact no one has complained about this before
that I can recall) and involves a mechanism that
has been there since the start of OpenSAF.
If the same problem had popped up in testing of 4.5 it would indicate som
introduced problem.
But no one has reported any problem like this.
/AndersBj
________________________________
From: Adrian Szwej [mailto:[email protected]]
Sent: den 15 september 2014 12:04
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #1072 Sync stop after few payload nodes joining
the cluster (TCP)
I don't think it is performance problems.
There is nothing indicating CPU load; memory; nor IO bandwith.
This is just a simple node joining seem to trigger some "logical" bug.
There is no application; but just pure opensaf.
I am now trying to elaborate with different MDS configuration options and MDS
buffer settings together with MTU 9000 just to see if there is any difference
in triggering this bug.
Opensaf is running inside containers; meaning there is no virtualization
overhead.
Could you hint me what could cause the outstanding messages to reach 16?
E.g. could message loss / timing issue lead to this?
I am having more nodes configured than what is actually joining at the moment;
around 10. But I am bringing them into cluster one by one.
________________________________
[tickets:#1072]<http://sourceforge.net/p/opensaf/tickets/1072> Sync stop after
few payload nodes joining the cluster (TCP)
Status: invalid
Milestone: 4.3.3
Created: Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
Last Updated: Mon Sep 15, 2014 07:45 AM UTC
Owner: Anders Bjornerstedt
Communication is MDS over TCP. Cluster 2+3; where scenario is
Start SCs; start 1 payload; wait for sync; start second payload; wait for sync;
start 3rd payload. Third one fails; or sometimes it might be forth.
There is problem of getting more than 2/3 payloads synchronized due to a
consistent way of triggering a bug.
The following is triggered in the loading immnd causing the joined node to
timeout/fail to start up.
Sep 6 6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep 6 6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >>
immnd_evt_proc_search_next
Sep 6 6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look for
id:1664
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Sep 6 6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] <<
immnd_evt_proc_search_next
Sep 6 6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3: step:540
I have managed to overcome this bug temporary by making following patch:
+++ b/osaf/libs/common/immsv/include/immsv_api.h Sat Sep 06 08:38:16
2014 +0000
@@ -70,7 +70,7 @@
/*Max # of outstanding fevs messages towards director.*/
/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
+#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255
#define IMMSV_MAX_OBJECTS 10000
#define IMMSV_MAX_ATTRIBUTES 128
________________________________
Sent from sourceforge.net because you indicated interest in
https://sourceforge.net/p/opensaf/tickets/1072/<https://sourceforge.net/p/opensaf/tickets/1072>
To unsubscribe from further messages, please visit
https://sourceforge.net/auth/subscriptions/<https://sourceforge.net/auth/subscriptions>
---
** [tickets:#1072] Sync stop after few payload nodes joining the cluster (TCP)**
**Status:** invalid
**Milestone:** 4.3.3
**Created:** Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
**Last Updated:** Mon Sep 15, 2014 07:45 AM UTC
**Owner:** Anders Bjornerstedt
Communication is MDS over TCP. Cluster 2+3; where scenario is
Start SCs; start 1 payload; wait for sync; start second payload; wait for sync;
start 3rd payload. Third one fails; or sometimes it might be forth.
There is problem of getting more than 2/3 payloads synchronized due to a
consistent way of triggering a bug.
The following is triggered in the loading immnd causing the joined node to
timeout/fail to start up.
Sep 6 6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep 6 6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >>
immnd_evt_proc_search_next
Sep 6 6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look
for id:1664
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Sep 6 6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] <<
immnd_evt_proc_search_next
Sep 6 6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3:
step:540
I have managed to overcome this bug temporary by making following patch:
+++ b/osaf/libs/common/immsv/include/immsv_api.h Sat Sep 06 08:38:16
2014 +0000
@@ -70,7 +70,7 @@
/*Max # of outstanding fevs messages towards director.*/
/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
+#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255
#define IMMSV_MAX_OBJECTS 10000
#define IMMSV_MAX_ATTRIBUTES 128
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets