On Tue, 2010-02-16 at 15:34 -0700, hj lee wrote:
> Hi,
> 
> I wondered why there are token retransmissions happening time to time
> in my very idle cluster. When it happens, the retransmission happens
> right after (1 ms later) the token was sent. The token retransmit
> timeout is 137ms in my cluster. I found whenever HOLD_CANCEL message
> is created in ring representative, it causes unnecessary token
> retransmission. Most cases two tokens are sent in a row within 1 ms.
> 
> The message_handler_token_hold_cancel() calls
> timer_function_token_retransmit_timeout(), which causes one token
> transmit. The token is sent again when token_hold timer expires. Two
> events can happen in any order depending on timing. So to fix this
> issue, message_handler_token_hold_cancel() should cancel token_hold
> timer and call timer_function_token_hold_retransmit_timeout().
> 
> Sending one more token is OK, But if this happens in heavy loaded
> cluster, then it could trigger token loss timeout, and the cluster can
> be divided.
> 

HJ,

Logic is sound.

Can you work up a patch?

Regards
-steve

> Thank
> hj
> 
> -- 
> Peakpoint Service
> 
> Cluster Setup, Troubleshooting & Development
> [email protected]
> (303) 997-2823
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to