Hi,

I wondered why there are token retransmissions happening time to time in my
very idle cluster. When it happens, the retransmission happens right after
(1 ms later) the token was sent. The token retransmit timeout is 137ms in my
cluster. I found whenever HOLD_CANCEL message is created in ring
representative, it causes unnecessary token retransmission. Most cases two
tokens are sent in a row within 1 ms.

The message_handler_token_hold_cancel() calls
timer_function_token_retransmit_timeout(), which causes one token transmit.
The token is sent again when token_hold timer expires. Two events can happen
in any order depending on timing. So to fix this issue,
message_handler_token_hold_cancel() should cancel token_hold timer and call
timer_function_token_hold_retransmit_timeout().

Sending one more token is OK, But if this happens in heavy loaded cluster,
then it could trigger token loss timeout, and the cluster can be divided.

Thank
hj

-- 
Peakpoint Service

Cluster Setup, Troubleshooting & Development
[email protected]
(303) 997-2823
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to