Instead of blindly changing other configuration parameters, please first try to
find out what the PROBLEM is.
Go back to OpensAF defaults on all settings, except IMMSV_FEVS_MAX_PENDING
which you had
increased to 255 (the maximum possible).
You said you had "managed to overcome the perormance issue temporarily" by this
increase to 255.
What does that mean ?
Do you still get the problem after some time? or not? with only that change.
How much traffic are you generating ?
Not counting SYNC traffic here, I mean YOUR application traffic.
Do you have zero traffic ?
Obviously it is possible to generate too much traffic on ANY configuration and
you will end up with
symptoms like the ones you see.
If the problem appears "fixed" by the 255 (maximum) setting, try *reducing*
IMMSV_FEVS_MAX_PENDING
down again by 50% from 255 (current maximum possible) to 128.
Test this some time and see if you have a stable system.
If stable repeat, i.e. reduce again by 50%, test again etc, untill you get to a
level where the problem re-appears.
Then double the value back up to the lowest level where it appeared to be
stable.
This would solve the problem if the cause is that your setup has more VARIANCE
in latency,
more "bursty" traffic, more chunky scheduling of execution for the
containters/processors/processes/threads.
If that is the case then the problem is not traffic overload but that you
indeed need some buffers to be larger
to avoid the extremes of the variance to cut you off.
/AndersBj
________________________________
From: Adrian Szwej [mailto:[email protected]]
Sent: den 16 september 2014 00:47
To: [email protected]
Subject: [tickets] [opensaf:tickets] #1072 Sync stop after few payload nodes
joining the cluster (TCP)
I have also tried following flavours:
Larger MDS buffers
export MDS_SOCK_SND_RCV_BUF_SIZE=126976
DTM_SOCK_SND_RCV_BUF_SIZE=126976
Longer keep alive settings
OpenSAF build 4.5
MTU 9000
veth4e51 Link encap:Ethernet HWaddr aa:a6:f0:5f:0f:82
UP BROADCAST RUNNING MTU:9000 Metric:1
--
veth76a4 Link encap:Ethernet HWaddr 9a:ea:07:f4:be:55
UP BROADCAST RUNNING MTU:9000 Metric:1
--
vethb5f5 Link encap:Ethernet HWaddr 22:98:e3:39:32:34
UP BROADCAST RUNNING MTU:9000 Metric:1
--
vethb9e3 Link encap:Ethernet HWaddr d2:ec:18:c4:f9:2d
UP BROADCAST RUNNING MTU:9000 Metric:1
--
vethd703 Link encap:Ethernet HWaddr 3e:a0:49:c0:f0:73
UP BROADCAST RUNNING MTU:9000 Metric:1
--
vethf736 Link encap:Ethernet HWaddr 4e:c4:6e:74:fc:03
UP BROADCAST RUNNING MTU:9000 Metric:1
Ping during sync between containers show latency of 0.250-0.500 ms.
The result is the same.
I can provoke the problem by cycling start/stop of 6th opensaf instance in
linux container.
while ( true ); do /etc/init.d/opensafd stop && /etc/init.d/opensafd start; done
________________________________
[tickets:#1072]<http://sourceforge.net/p/opensaf/tickets/1072> Sync stop after
few payload nodes joining the cluster (TCP)
Status: invalid
Milestone: 4.3.3
Created: Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
Last Updated: Mon Sep 15, 2014 09:48 PM UTC
Owner: Anders Bjornerstedt
Communication is MDS over TCP. Cluster 2+3; where scenario is
Start SCs; start 1 payload; wait for sync; start second payload; wait for sync;
start 3rd payload. Third one fails; or sometimes it might be forth.
There is problem of getting more than 2/3 payloads synchronized due to a
consistent way of triggering a bug.
The following is triggered in the loading immnd causing the joined node to
timeout/fail to start up.
Sep 6 6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep 6 6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >>
immnd_evt_proc_search_next
Sep 6 6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look for
id:1664
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Sep 6 6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] <<
immnd_evt_proc_search_next
Sep 6 6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3: step:540
I have managed to overcome this bug temporary by making following patch:
+++ b/osaf/libs/common/immsv/include/immsv_api.h Sat Sep 06 08:38:16
2014 +0000
@@ -70,7 +70,7 @@
/*Max # of outstanding fevs messages towards director.*/
/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
+#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255
#define IMMSV_MAX_OBJECTS 10000
#define IMMSV_MAX_ATTRIBUTES 128
________________________________
Sent from sourceforge.net because [email protected] is
subscribed to
https://sourceforge.net/p/opensaf/tickets/<https://sourceforge.net/p/opensaf/tickets>
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets