On Fri, Feb 19, 2010 at 10:02 AM, Digimer <[email protected]> wrote: > Hi all, > > I've been reading through the man page and have been struggling to > understand the relationship of the redundant ring protocol options. I > think I understand now, but would be grateful if someone could confirm > that I've got it right or not. > > rrp_problem_count_timeout > > Two purposes; > > - When no errors are seen for this many milliseconds, > rrp_problem_count_threshold is decremented by 1. > > - While an error exists, this many milliseconds is the upper limit > before the interface is declared bad. > > How is this different from 'token'? >
> > rrp_problem_count_threshold > > - Starts at '0' and is increased by 1 every rrp_token_expired_timeout > milliseconds without receiving a token. > > - Counts down by 1 every rrp_problem_count_timeout milliseconds without > a problem > > How is this different from 'fail_to_recv_const'? > > rrp_token_expired_timeout > > - This is the maximum time that can pass without receiving a token > before triggering an increment of rrp_problem_count_threshold. > > How is this different from 'max_network_delay'? > > I am sure I am misunderstanding something here. :) > > RRP has two modes, active and passive. The rrp_problem_count_timeout and rrp_token_expired_timeout are only used in active mode. The rrp_problem_count_threshold is used for both active and passive mode, it is constant, never changes. In active mode, if token does not arrive within rrp_token_expired_timeout, then internal problem_count is increased by 1. If token arrives within rrp_problem_count_timeout, then the interval problem_count is decreased by 1. While keep doing this, if the problem_counter is more or equal than the configured rrp_problem_count_threshold, then that interface becomes FAULTY interface, won't be used any more until administrator fixes it. In passive mode, it maintains token_recv_count and mcast_recv_count. Whenever a token or mcast msg is received, the corresponding count is increased by 1. Also it compares this count to other interface. If that difference is more than rrp_problem_count_threshold, then the interface (has smaller count value) becomes FAULTY, won't be used any more until administrator fixes it. These are there to detect a faulty interface. The token lost timeout is mainly to detect a node. When the toke lost timeout expires, the corosync will enter GATHER mode to find out which nodes are there currently. The fail_to_recv_const is mainly to detect a faulty node that fails to receive a message. The corosync(or cluster) can not wait forever this situation, it enters GATHER mode if a node fails to receive a message for fail_to_recv_const rotation. Thanks hj -- Peakpoint Service Cluster Setup, Troubleshooting & Development [email protected] (303) 997-2823
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
