Hello again, I don't think it is a good a idea to start a new thread with the same issue.
could this be a DNS resolution caching problem? See https://issues.apache.org/jira/browse/ZOOKEEPER-1506 The new server has the lowest sid. It is able to connect to all other servers, but the rest of the servers don't seem able to connect to it. Connections from this server to the rest are useless, since they are dropped because of the sid comparison that you see in the log. You could try to change the server address in the configuration for the AWS public IP address of the peers, just to test if that works ok. Or try replacing the server with the highest sid, that should also work. Otherwise (assuming the problem is DNS resolution), the only current workaround that I can think of is the rolling restart, as you have noticed. On Wed, Nov 6, 2013 at 9:51 AM, Bae, Jae Hyeon <[email protected]> wrote: > Hi Zookeeper users > > With the same zoo.cfg, new server with empty zk data directory cannot join > quorum with the same IP, same version of zk and the port. I didn't see any > significant error messages but the following lines repeated: > > 2013-11-05 17:42:08,287 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :QuorumPeer@670] - LOOKING > 2013-11-05 17:42:08,290 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :FastLeaderElection@740] - New election. My id = 1, proposed zxid=0x0 > 2013-11-05 17:42:08,293 - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1 > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 > (n.peerEPoch), LOOKING (my state) > 2013-11-05 17:42:08,301 - INFO [WorkerSender[myid=1]:QuorumCnxManager@190 > ] > - Have smaller server identifier, so dropping the connection: (2, 1) > 2013-11-05 17:42:08,304 - INFO [WorkerSender[myid=1]:QuorumCnxManager@190 > ] > - Have smaller server identifier, so dropping the connection: (3, 1) > 2013-11-05 17:42:08,308 - INFO [WorkerSender[myid=1]:QuorumCnxManager@190 > ] > - Have smaller server identifier, so dropping the connection: (4, 1) > 2013-11-05 17:42:08,311 - INFO [WorkerSender[myid=1]:QuorumCnxManager@190 > ] > - Have smaller server identifier, so dropping the connection: (5, 1) > 2013-11-05 17:42:08,511 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :QuorumCnxManager@190] - Have smaller server identifier, so dropping the > connection: (5, 1) > 2013-11-05 17:42:08,515 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :QuorumCnxManager@190] - Have smaller server identifier, so dropping the > connection: (2, 1) > 2013-11-05 17:42:08,518 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :QuorumCnxManager@190] - Have smaller server identifier, so dropping the > connection: (3, 1) > 2013-11-05 17:42:08,522 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :QuorumCnxManager@190] - Have smaller server identifier, so dropping the > connection: (4, 1) > 2013-11-05 17:42:08,523 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181 > :FastLeaderElection@774] - Notification time out: 400 > > Do you have any idea what I am doing wrong here? I asked the same question > yesterday and I got response the new server should start normally, sync and > join quorum successfully. > > Thank you > Best, Jae >
