Re: Member isn't responding to heartbeat requests

Bruce Schuchardt Mon, 25 Feb 2019 07:56:49 -0800

In a distributed system nodes (servers, locators) are continuallywatching other nodes to ensure that something bad hasn't happened. Oneof the ways this is done in Geode is for each node to watch one othernode and expect periodic signs that it's still alive. This is donethrough TCP messaging. Any message from the node being watched countsas proof that it's still alive. If no messages are seen within the"member-timeout" period (see Distributed System settings, default5000ms) then a "heartbeat" is requested over UDP. If no message isreceived in another "member-timeout" interval we attempt to directlycontact the suspect with a tcp/ip connection requesting that it verifyits identity. If this fails the suspect is kicked out of the cluster.

So, you could increase your member-timeout setting or maybe investigatewhy messages, especially hearbeats, aren't being received. A tcp/ipperformance measuring tool might help in that regard - run one to seewhat the packet-loss percentage is and if it's high look into why that'shappening.

It's also possible that garbage-collection is kicking in on the memberthat "isn't responding to heartbeat requests" or that it's not gettingenough CPU for other reasons.


On 2/25/19 2:39 AM, Avital Amity wrote:

Hi,
I have an environment where I servers and locator go down from time totime with the below error:
Member isn't responding to heartbeat requests
Any suggestion regarding relevant configuration/other thing to check?What can lead to this issue?
Thanks

Avital
*This email and the information contained herein is proprietary andconfidential and subject to the Amdocs Email Terms of Service, whichyou may review at**https://www.amdocs.com/about/email-terms-of-service*

Re: Member isn't responding to heartbeat requests

Reply via email to