typically when I've seen that gossip issue it requires more than just
restarting the affected node to fix. if you're not getting query related
errors in the server log you should start looking at what is being queried.
are the queries that time out each day the same?
I can't say that I have tried that while the issue is going on, but I have
done such rolling restarts for sure, and the timeouts still occur every
day. What would a rolling restart do to fix the issue?
In fact, as I write this, I am restarting each node one by one in the
eu-west-1 datacenter, and
have you tried a rolling restart of the entire DC?
Hi there -
Cluster info:
C* 3.9, replicated across 4 EC2 regions (us-east-1, us-west-2, eu-west-1,
ap-southeast-1), c4.4xlarge
Around the same time every day (~7-8am EST), 2 DC's (eu-west-1 and
ap-southeast-1) in our cluster start experiencing a high number of timeouts
(Connection.TotalTimeouts