Azure has aggressively low keepalive settings for it's networks. Ignore the Mongo parts of this link and have a look at the OS settings they change.
https://docs.mongodb.com/ecosystem/platforms/windows-azure/ *---------------------------------------------------Cliff Gilmore* *Vanguard Solutions Architect* *M: 314-825-4413* DataStax, Inc. | www.DataStax.com <http://www.datastax.com/> On Thu, Oct 27, 2016 at 5:48 AM, Vlad <qa23d-...@yahoo.com> wrote: > Hello, > > I put two nodes cluster on Azure. Each node in its own DC (ping about 10 > ms.), inter-node connection (SSL port 7001) is going throw external IPs, > i.e. > > *listen_interface: eth0* > *broadcast_address: 1.1.1.1* > > Cluster is starting, cqlsh can connect, stress-tool survives night of > writes with replication factor two, all seems to be fine. But when cluster > is leaved without load it becomes nonfunctional after several minutes of > idle. Attempt to connect fails with error > > *Connection error: ('Unable to connect to any servers', {'1.1.1.1': > OperationTimedOut('errors=Timed out creating connection (10 seconds), > last_host=None',)})* > > There is message > *WARN 10:06:32 RequestExecutionException READ_TIMEOUT: Operation timed > out - received only 1 responses.* > > on one node six minutes after start (no load or connect in this time). > > nodetool status shows both nodes as UN (Up and Normal, I guess) > > I suspected connectivity problem, but tcpdump shows constant traffic on > port 7001 between nodes. Restarting OTHER node than I'm connection to > solves the problem for another several minutes. I increased TCP idle time > in Azure IP address setting to 30 minutes, but it had no effect. > > Thanks, Vlad > > >