Can you tell why the server wasn't responding to the notifications from the 
observer? The log file is from the observer and it sounds like it is being able 
to send messages out, but it isn't clear why the server isn't responding.

-Flavio

> On 14 Oct 2015, at 01:51, elastic search <[email protected]> wrote:
> 
> 
> Hello Experts
> 
> We have 2 Observers running in AWS connecting over to local ZK Ensemble in 
> our own DataCenter.
> 
> There have been instances where we see network drop for a minute between the 
> networks.
> However the Observers take around 15 minutes to recover even if the network 
> outage is for a minute.
> 
> From the logs
> java.net.SocketTimeoutException: Read timed out
> 2015-10-13 22:26:03,927 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 400
> 2015-10-13 22:26:04,328 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 800
> 2015-10-13 22:26:05,129 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 1600
> 2015-10-13 22:26:06,730 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 3200
> 2015-10-13 22:26:09,931 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 6400
> 2015-10-13 22:26:16,332 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 12800
> 2015-10-13 22:26:29,133 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 25600
> 2015-10-13 22:26:54,734 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 51200
> 2015-10-13 22:27:45,935 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:28:45,936 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:29:45,937 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:30:45,938 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:31:45,939 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:32:45,940 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:33:45,941 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:34:45,942 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:35:45,943 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:36:45,944 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:37:45,945 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:38:45,946 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:39:45,947 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:40:45,948 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 2015-10-13 22:41:45,949 [myid:4] - INFO  
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - 
> Notification time out: 60000
> 
> And then finally exits the QuorumCnxManager run loop with the following 
> message
> WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@780] - Connection broken for 
> id 2
> 
> How can we ensure the observer does not go out for service such a long 
> duration ?
> 
> Attached the full logs 
> 
> Please help
> Thanks
> 
> <zookeeper.log.zip>

Reply via email to