Well a hint is that you managed to bypass the problem (temporarily) by 
increasing a queue size.

The error:
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too 
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Is very rarely seen, but can happen due to the latency of fevs turn arround 
being lower than the rate of generated trafic.

So the  question for you is simply why this happens in our setup and with your 
traffic.
or if there is anything else unusual with your setup or traffic.
If the *only* imm  traffic is sync traffic then it is really strange.

Again, this is a rare problem (in fact no one has complained about this before 
that I can recall) and involves a mechanism that
has been there since the start of OpenSAF.

If the same problem had popped up in testing of 4.5 it would indicate som 
introduced problem.
But no one has reported any problem like this.

/AndersBj

________________________________
From: Adrian Szwej [mailto:[email protected]]
Sent: den 15 september 2014 12:04
To: [opensaf:tickets]
Subject: [opensaf:tickets] Re: #1072 Sync stop after few payload nodes joining 
the cluster (TCP)


I don't think it is performance problems.
There is nothing indicating CPU load; memory; nor IO bandwith.
This is just a simple node joining seem to trigger some "logical" bug.
There is no application; but just pure opensaf.

I am now trying to elaborate with different MDS configuration options and MDS 
buffer settings together with MTU 9000 just to see if there is any difference 
in triggering this bug.

Opensaf is running inside containers; meaning there is no virtualization 
overhead.

Could you hint me what could cause the outstanding messages to reach 16?
E.g. could message loss / timing issue lead to this?
I am having more nodes configured than what is actually joining at the moment; 
around 10. But I am bringing them into cluster one by one.

________________________________

[tickets:#1072]<http://sourceforge.net/p/opensaf/tickets/1072> Sync stop after 
few payload nodes joining the cluster (TCP)

Status: invalid
Milestone: 4.3.3
Created: Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
Last Updated: Mon Sep 15, 2014 07:45 AM UTC
Owner: Anders Bjornerstedt

Communication is MDS over TCP. Cluster 2+3; where scenario is
Start SCs; start 1 payload; wait for sync; start second payload; wait for sync; 
start 3rd payload. Third one fails; or sometimes it might be forth.

There is problem of getting more than 2/3 payloads synchronized due to a 
consistent way of triggering a bug.

The following is triggered in the loading immnd causing the joined node to 
timeout/fail to start up.

Sep 6 6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep 6 6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >> 
immnd_evt_proc_search_next
Sep 6 6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look for 
id:1664
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too 
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Sep 6 6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] << 
immnd_evt_proc_search_next
Sep 6 6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3: step:540

I have managed to overcome this bug temporary by making following patch:

+++ b/osaf/libs/common/immsv/include/immsv_api.h        Sat Sep 06 08:38:16 
2014 +0000
@@ -70,7 +70,7 @@

 /*Max # of outstanding fevs messages towards director.*/
 /*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
+#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255

 #define IMMSV_MAX_OBJECTS 10000
 #define IMMSV_MAX_ATTRIBUTES 128


________________________________

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/1072/<https://sourceforge.net/p/opensaf/tickets/1072>

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/<https://sourceforge.net/auth/subscriptions>



---

** [tickets:#1072] Sync stop after few payload nodes joining the cluster (TCP)**

**Status:** invalid
**Milestone:** 4.3.3
**Created:** Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
**Last Updated:** Mon Sep 15, 2014 07:45 AM UTC
**Owner:** Anders Bjornerstedt

Communication is MDS over TCP. Cluster 2+3; where scenario is 
Start SCs; start 1 payload; wait for sync; start second payload; wait for sync; 
start 3rd payload. Third one fails; or sometimes it might be forth.

There is problem of getting more than 2/3 payloads synchronized due to a 
consistent way of triggering a bug.

The following is triggered in the loading immnd causing the joined node to 
timeout/fail to start up.

Sep  6  6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep  6  6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >> 
immnd_evt_proc_search_next
Sep  6  6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look 
for id:1664
Sep  6  6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too 
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Sep  6  6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] << 
immnd_evt_proc_search_next
Sep  6  6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3: 
step:540

I have managed to overcome this bug temporary by making following patch:

    +++ b/osaf/libs/common/immsv/include/immsv_api.h        Sat Sep 06 08:38:16 
2014 +0000
    @@ -70,7 +70,7 @@

     /*Max # of outstanding fevs messages towards director.*/
     /*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
    -#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
    +#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255

     #define IMMSV_MAX_OBJECTS 10000
     #define IMMSV_MAX_ATTRIBUTES 128



---

Sent from sourceforge.net because [email protected] is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to