Hi Adrian,
I have re-open the ticket and change component to MDS.
MDS responsible may be able to diagnose the cause just based on the
coredump.
I have not checked the MDS backlog if there is any older ticket
documenting similar symptoms.
https://sourceforge.net/p/opensaf/tickets/search/?q=status%3A%28unassigned+accepted+assigned+review%29+AND+_component%3A%28mds+dtm%29
I will leave that to MDS responsible.
/AndesBj
Adrian Szwej wrote:
> It is the IMMD that is crashing causing the messages to become pending.
> I am attaching coredump and immnd and immd trace files from SC-1 where 7
> nodes join one by one. When PL-8 joins; the IMMD coredumps.
>
> The code used was changeset 5828:df7bef2079b1 + change of
> IMMSV_DEFAULT_FEVS_MAX_PENDING to 255.
>
>
>
> ---
>
> ** [tickets:#1072] Sync stop after few payload nodes joining the cluster
> (TCP)**
>
> **Status:** invalid
> **Milestone:** 4.3.3
> **Created:** Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
> **Last Updated:** Mon Sep 15, 2014 10:46 PM UTC
> **Owner:** Anders Bjornerstedt
>
> Communication is MDS over TCP. Cluster 2+3; where scenario is
> Start SCs; start 1 payload; wait for sync; start second payload; wait for
> sync; start 3rd payload. Third one fails; or sometimes it might be forth.
>
> There is problem of getting more than 2/3 payloads synchronized due to a
> consistent way of triggering a bug.
>
> The following is triggered in the loading immnd causing the joined node to
> timeout/fail to start up.
>
> Sep 6 6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received:
> IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
> Sep 6 6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >>
> immnd_evt_proc_search_next
> Sep 6 6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look
> for id:1664
> Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
> many pending incoming fevs messages (> 16) rejecting sync iteration next
> request
> Sep 6 6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] <<
> immnd_evt_proc_search_next
> Sep 6 6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3:
> step:540
>
> I have managed to overcome this bug temporary by making following patch:
>
> +++ b/osaf/libs/common/immsv/include/immsv_api.h Sat Sep 06
> 08:38:16 2014 +0000
> @@ -70,7 +70,7 @@
>
> /*Max # of outstanding fevs messages towards director.*/
> /*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
> -#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
> +#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255
>
> #define IMMSV_MAX_OBJECTS 10000
> #define IMMSV_MAX_ATTRIBUTES 128
>
>
>
> ---
>
> Sent from sourceforge.net because you indicated interest in
> <https://sourceforge.net/p/opensaf/tickets/1072/>
>
> To unsubscribe from further messages, please visit
> <https://sourceforge.net/auth/subscriptions/>
>
---
** [tickets:#1072] Sync stop after few payload nodes joining the cluster (TCP)**
**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Fri Sep 12, 2014 09:20 PM UTC by Adrian Szwej
**Last Updated:** Thu Sep 18, 2014 06:05 AM UTC
**Owner:** nobody
Communication is MDS over TCP. Cluster 2+3; where scenario is
Start SCs; start 1 payload; wait for sync; start second payload; wait for sync;
start 3rd payload. Third one fails; or sometimes it might be forth.
There is problem of getting more than 2/3 payloads synchronized due to a
consistent way of triggering a bug.
The following is triggered in the loading immnd causing the joined node to
timeout/fail to start up.
Sep 6 6:58:02.096550 osafimmnd [502:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_SEARCHNEXT (17) from 2020f
Sep 6 6:58:02.096575 osafimmnd [502:immnd_evt.c:1443] >>
immnd_evt_proc_search_next
Sep 6 6:58:02.096613 osafimmnd [502:immnd_evt.c:1454] T2 SEARCH NEXT, Look
for id:1664
Sep 6 6:58:02.096641 osafimmnd [502:ImmModel.cc:1366] T2 ERR_TRY_AGAIN: Too
many pending incoming fevs messages (> 16) rejecting sync iteration next request
Sep 6 6:58:02.096725 osafimmnd [502:immnd_evt.c:1676] <<
immnd_evt_proc_search_next
Sep 6 6:58:03.133230 osafimmnd [502:immnd_proc.c:1980] IN Sync Phase-3:
step:540
I have managed to overcome this bug temporary by making following patch:
+++ b/osaf/libs/common/immsv/include/immsv_api.h Sat Sep 06 08:38:16
2014 +0000
@@ -70,7 +70,7 @@
/*Max # of outstanding fevs messages towards director.*/
/*Note max-max is 255. cb->fevs_replies_pending is an uint8_t*/
-#define IMMSV_DEFAULT_FEVS_MAX_PENDING 16
+#define IMMSV_DEFAULT_FEVS_MAX_PENDING 255
#define IMMSV_MAX_OBJECTS 10000
#define IMMSV_MAX_ATTRIBUTES 128
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets