On Tue, 2010-02-16 at 15:34 -0700, hj lee wrote: > Hi, > > I wondered why there are token retransmissions happening time to time > in my very idle cluster. When it happens, the retransmission happens > right after (1 ms later) the token was sent. The token retransmit > timeout is 137ms in my cluster. I found whenever HOLD_CANCEL message > is created in ring representative, it causes unnecessary token > retransmission. Most cases two tokens are sent in a row within 1 ms. > > The message_handler_token_hold_cancel() calls > timer_function_token_retransmit_timeout(), which causes one token > transmit. The token is sent again when token_hold timer expires. Two > events can happen in any order depending on timing. So to fix this > issue, message_handler_token_hold_cancel() should cancel token_hold > timer and call timer_function_token_hold_retransmit_timeout(). > > Sending one more token is OK, But if this happens in heavy loaded > cluster, then it could trigger token loss timeout, and the cluster can > be divided. >
HJ, Logic is sound. Can you work up a patch? Regards -steve > Thank > hj > > -- > Peakpoint Service > > Cluster Setup, Troubleshooting & Development > [email protected] > (303) 997-2823 > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
