increase failed_to_recv_const in the configuration. Right now its probably
about 30.
It represents the number of rotations where a message should have been
received via multicast but it was not.
It is usually a sign of overloaded network (as you mentioned) or possibly
coupled with a poorly designed switch.
As far as the todo, which version of openais are you using? I believe i
fixed this in whitetank and also the trunk forward port should have this
problem cleared up.
regards
-steve
On Wed, Jun 18, 2008 at 6:36 PM, Angus & Anna Salkeld <[EMAIL PROTECTED]>
wrote:
> Hi
>
> In a VERY busy network (storm conditions) I am getting the following
> log messages continuously:
>
> 01:19:17 awplus openais[1474]: [TOTEM] FAILED TO RECEIVE
> 01:19:17 awplus openais[1474]: [amf.c:1365] >amf_confchg_fn: mnum: 2,
> jnum: 0, lnum: 0, sync state: NORMAL_OPERATION, ring ID 356 rep
> 192.168.255.1
> 01:19:17 awplus openais[1474]: [amf.c:1365] >amf_confchg_fn: mnum: 2,
> jnum: 0, lnum: 0, sync state: NORMAL_OPERATION, ring ID 356 rep
> 192.168.255.1
> 01:19:17 awplus openais[1474]: [SYNC ] This node is within the primary
> component and will provide service.
> 01:19:17 awplus openais[1474]: [TOTEM] entering OPERATIONAL state.
> 01:19:24 awplus openais[1474]: [amf.c:1365] >amf_confchg_fn: mnum: 2,
> jnum: 0, lnum: 0, sync state: NORMAL_OPERATION, ring ID 360 rep
> 192.168.255.1
> 01:19:24 awplus openais[1474]: [amf.c:1365] >amf_confchg_fn: mnum: 2,
> jnum: 0, lnum: 0, sync state: NORMAL_OPERATION, ring ID 360 rep
> 192.168.255.1
> 01:19:24 awplus openais[1474]: [SYNC ] This node is within the primary
> component and will provide service.
> 01:19:24 awplus openais[1474]: [TOTEM] entering OPERATIONAL state.
>
>
> In exec/totemsrp.c line ~3333 there is the following TODO:
>
> if (instance->my_aru_count >
> instance->totem_config->fail_to_recv_const &&
> token->aru_addr != instance->my_id.addr[0].nodeid) {
>
> log_printf (instance->totemsrp_log_level_error,
> "FAILED TO RECEIVE\n");
> // TODO if we fail to receive, it may be possible to end with a gather
> // state of proc == failed = 0 entries
> /* THIS IS A BIG TODO
> memb_set_merge (&token->aru_addr, 1,
> instance->my_failed_list,
> &instance->my_failed_list_entries);
> */
>
> ring_state_restore (instance);
>
> memb_state_gather_enter (instance, 6);
> } else {
>
> My questions are:
> 1] Am I right in the following:
> - totem thinks that it has lost a node
> - it send a member join message
> - the member joins quite happily
> - repeat the above sequence
>
> 2] If the above is true what can I do to prevent the state flap?
> - within this horrible network
>
> 3] What needs to be done in the TODO (the comment is a bit cryptic to me)?
>
> Thanks
> Angus Salkeld
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais