date:20150525

ZooKeeper_branch34_jdk7 - Build # 897 - Still Failing

2015-05-25 Thread Apache Jenkins Server

See https://builds.apache.org/job/ZooKeeper_branch34_jdk7/897/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 213627 lines...]
[junit] 2015-05-25 09:36:28,732 [myid:] - INFO  [main:JMXEnv@246] - 
expect:StandaloneServer_port
[junit] 2015-05-25 09:36:28,732 [myid:] - INFO  [main:JMXEnv@250] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221
[junit] 2015-05-25 09:36:28,732 [myid:] - INFO  [main:ClientBase@490] - 
STOPPING server
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - 
NIOServerCnxn factory exited run method
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  [main:ZooKeeperServer@441] 
- shutting down
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  
[main:SessionTrackerImpl@225] - Shutting down
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  
[main:PrepRequestProcessor@760] - Shutting down
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  
[main:SyncRequestProcessor@209] - Shutting down
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited!
[junit] 2015-05-25 09:36:28,734 [myid:] - INFO  
[main:FinalRequestProcessor@415] - shutdown of request processor complete
[junit] 2015-05-25 09:36:28,733 [myid:] - INFO  [ProcessThread(sid:0 
cport:11221)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
[junit] 2015-05-25 09:36:28,734 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2015-05-25 09:36:28,734 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2015-05-25 09:36:28,736 [myid:] - INFO  [main:ClientBase@443] - 
STARTING server
[junit] 2015-05-25 09:36:28,736 [myid:] - INFO  [main:ClientBase@364] - 
CREATING server instance 127.0.0.1:11221
[junit] 2015-05-25 09:36:28,736 [myid:] - INFO  
[main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2015-05-25 09:36:28,736 [myid:] - INFO  [main:ClientBase@339] - 
STARTING server instance 127.0.0.1:11221
[junit] 2015-05-25 09:36:28,737 [myid:] - INFO  [main:ZooKeeperServer@162] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/branch-3.4/build/test/tmp/test7109069252678877654.junit.dir/version-2
 snapdir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/branch-3.4/build/test/tmp/test7109069252678877654.junit.dir/version-2
[junit] 2015-05-25 09:36:28,739 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221
[junit] 2015-05-25 09:36:28,740 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - 
Accepted socket connection from /127.0.0.1:50004
[junit] 2015-05-25 09:36:28,740 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing 
stat command from /127.0.0.1:50004
[junit] 2015-05-25 09:36:28,740 [myid:] - INFO  
[Thread-4:NIOServerCnxn$StatCommand@663] - Stat command output
[junit] 2015-05-25 09:36:28,741 [myid:] - INFO  
[Thread-4:NIOServerCnxn@1007] - Closed socket connection for client 
/127.0.0.1:50004 (no session established for client)
[junit] 2015-05-25 09:36:28,741 [myid:] - INFO  [main:JMXEnv@229] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2015-05-25 09:36:28,742 [myid:] - INFO  [main:JMXEnv@246] - 
expect:InMemoryDataTree
[junit] 2015-05-25 09:36:28,743 [myid:] - INFO  [main:JMXEnv@250] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree
[junit] 2015-05-25 09:36:28,743 [myid:] - INFO  [main:JMXEnv@246] - 
expect:StandaloneServer_port
[junit] 2015-05-25 09:36:28,743 [myid:] - INFO  [main:JMXEnv@250] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221
[junit] 2015-05-25 09:36:28,743 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 10471
[junit] 2015-05-25 09:36:28,743 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 20
[junit] 2015-05-25 09:36:28,743 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota
[junit] 2015-05-25 09:36:28,744 [myid:] - INFO  [main:ClientBase@520] - 
tearDown starting
[junit] 2015-05-25 09:36:28,821 [myid:] - INFO  [main:ZooKeeper@684] - 
Session: 0x14d8a6e37b9 closed
[junit] 2015-05-25 09:36:28,822 [myid:] - INFO  [main:ClientBase@490] - 
STOPPING server
[junit] 2015-05-25 09:36:28,822 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down for 
session: 0x14d8a6e37b9
[junit] 2015-05-25 09:36:28,822 [myid:] - INFO

ZooKeeper-trunk-solaris - Build # 1042 - Still Failing

2015-05-25 Thread Apache Jenkins Server

See https://builds.apache.org/job/ZooKeeper-trunk-solaris/1042/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by timer
Building remotely on solaris1 (Solaris) in workspace 
/export/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris
Updating http://svn.apache.org/repos/asf/zookeeper/trunk at revision 
'2015-05-25T08:32:18.579 +'
At revision 1681563
Updating http://svn.apache.org/repos/asf/hadoop/nightly at revision 
'2015-05-25T08:32:18.579 +'
At revision 1681563
no change for http://svn.apache.org/repos/asf/zookeeper/trunk since the 
previous build
no change for http://svn.apache.org/repos/asf/hadoop/nightly since the previous 
build
No emails were triggered.
[locks-and-latches] Checking to see if we really have the locks
[locks-and-latches] Have all the locks, build can start
[ZooKeeper-trunk-solaris] $ /bin/bash /var/tmp/hudson4219020641947618107.sh
/var/tmp/hudson4219020641947618107.sh: line 12: ant: command not found
Build step 'Execute shell' marked build as failure
[locks-and-latches] Releasing all the locks
[locks-and-latches] All the locks released
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Michi Mutsuzaki (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558059#comment-14558059
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


Thanks Ziyou.

[~fpj] [~shralex] it looks like node1 and node2 are not forming a quorum 
because node2 has seen zxid 0x10059 but node1 keeps sending 0x0 as its 
zxid. Isn't node1 supposed send the highest zxid it has seen?

From zookeeper-1.log:

{noformat}
2015-05-25 12:34:36,920 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version 
= 10049
2015-05-25 12:34:39,090 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version 
= 10049
2015-05-25 12:35:28,128 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version 
= 10049
2015-05-25 12:35:30,301 [myid:1] - DEBUG 
[WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - 
Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version 
= 10049
{noformat}

From zookeeper-2.log:

{noformat}
2015-05-25 12:34:36,918 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message 
format version), 2 (n.leader), 0x10059 (n.zxid), 0x1 (n.round), LOOKING 
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10049 (n.config 
version)
2015-05-25 12:34:36,923 [myid:2] - INFO  
[WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message 
format version), 1 (n.leader), 0x0 (n.zxid), 0x (n.round), 
LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10049 
(n.config version)
2015-05-25 12:35:28,124 [myid:2] - DEBUG 
[QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending 
Notification: 2 (n.leader), 0x10059 (n.zxid), 0x1 (n.round), 1 (recipient), 
2 (myid), 0x1 (n.peerEpoch)
2015-05-25 12:35:28,125 [myid:2] - DEBUG 
[QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending 
Notification: 2 (n.leader), 0x10059 (n.zxid), 0x1 (n.round), 2 (recipient), 
2 (myid), 0x1 (n.peerEpoch)
{noformat}


 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

ZooKeeper-trunk-openjdk7 - Build # 821 - Still Failing

2015-05-25 Thread Apache Jenkins Server

See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/821/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 369784 lines...]
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-25 19:56:44,616 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:52700:LearnerHandler@879] - Ignoring unexpected 
exception
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-25 19:56:44,617 [myid:] - INFO  
[ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - 
ConnnectionExpirerThread interrupted
[junit] 2015-05-25 19:56:44,618 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-25 19:56:44,618 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:30211:NIOServerCnxnFactory$AcceptThread@219]
 - accept thread exitted run method
[junit] 2015-05-25 19:56:44,618 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-25 19:56:44,619 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=Leader]
[junit] 2015-05-25 19:56:44,619 [myid:] - INFO  
[/127.0.0.1:30213:QuorumCnxManager$Listener@654] - Leaving listener
[junit] 2015-05-25 19:56:44,619 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):QuorumPeer@1039]
 - Unexpected exception
[junit] java.lang.InterruptedException
[junit] at java.lang.Object.wait(Native Method)
[junit] at 
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:559)
[junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1036)
[junit] 2015-05-25 19:56:44,619 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):Leader@613] 
- Shutting down
[junit] 2015-05-25 19:56:44,619 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):QuorumPeer@1070]
 - PeerState set to LOOKING
[junit] 2015-05-25 19:56:44,619 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):QuorumPeer@1052]
 - QuorumPeer main thread exited
[junit] 2015-05-25 19:56:44,619 [myid:] - INFO  [main:QuorumUtil@254] - 
Shutting down leader election 
QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled)
[junit] 2015-05-25 19:56:44,619 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3]
[junit] 2015-05-25 19:56:44,619 [myid:] - INFO  [main:QuorumUtil@259] - 
Waiting for QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled) 
to exit thread
[junit] 2015-05-25 19:56:44,620 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3]
[junit] 2015-05-25 19:56:44,620 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.1]
[junit] 2015-05-25 19:56:44,620 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:30211)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.2]
[junit] 2015-05-25 19:56:44,620 [myid:] - INFO  
[main:FourLetterWordMain@63] - connecting to 127.0.0.1 30205
[junit] 2015-05-25 19:56:44,621 [myid:] - INFO  [main:QuorumUtil@243] - 
127.0.0.1:30205

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Ziyou Wang (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558632#comment-14558632
]

Ziyou Wang commented on ZOOKEEPER-2172:
---

The reason why there are many clients connect and disconnect is we use zk cli
to do the operations. So the cli just create a client, do the operation and
shutdown it.

From our tests, I think it should be a race condition problem and may not have
relationship with the cli. When the cluster is running in a faster
environment, we just need to wait a small interval between starting the node3
and reconfiguring it to join cluster, then we almost can avoid this problem.
But when we move to a slower environment, it can be reproduced consistently
again. So the temp solution is extending the waiting time and we can avoid it
again.

Cluster crashes when reconfig a new node as a participant
-

Key: ZOOKEEPER-2172
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
Project: ZooKeeper
Issue Type: Bug
Components: leaderElection, quorum, server
Affects Versions: 3.5.0
Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
Attachments: node-1.log, node-2.log, node-3.log,
zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log,
zookeeper-2.log, zookeeper-3.log

The operations are quite simple: start three zk servers one by one, then
reconfig the cluster to add the new one as a participant. When I add the
third one, the zk cluster may enter a weird state and cannot recover.

I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1
cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log.
So the first node received the reconfig cmd at 12:53:48. Latter, it logged
“2015-04-20 12:53:52,230 [myid:1] - ERROR
[LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception
causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1]
- WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE
/10.0.0.2:55890 ”. From then on, the first node and second node
rejected all client connections and the third node didn’t join the cluster as
a participant. The whole cluster was done.

When the problem happened, all three nodes just used the same dynamic
config file zoo.cfg.dynamic.1005d which only contained the first two
nodes. But there was another unused dynamic config file in node-1 directory
zoo.cfg.dynamic.next which already contained three nodes.

When I extended the waiting time between starting the third node and
reconfiguring the cluster, the problem didn’t show again. So it should be a
race condition problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-25 Thread Yasuhito Fukuda (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558634#comment-14558634
 ] 

Yasuhito Fukuda commented on ZOOKEEPER-2193:


Hi [~shralex], [~mitake], thanks for your comments.
I have recreated patch to reflect your comments.
please see patch v2. 
loopback address(localhost/127.0.0.1) and wildCard address(0.0.0.0) may be same 
address:port by different servers, i think. So, I added the code to exclude 
them in validation.

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-25 Thread Alexander Shraer (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558599#comment-14558599
 ] 

Alexander Shraer commented on ZOOKEEPER-2193:
-

Thanks for the patch! here are some comments:

One can use reconfig -add to change the ports of a server dynamically, 
basically adding a server that is already in the system, with different ports. 
In that case it is possible that all the ports will be the same but one. To do 
this nextServers.remove(qs.id); removes the old spec of a server and the 
following line adds the new spec. To avoid checking a server against its old 
self,  I think that nextServers.remove(qs.id); should go before your checks.

Also, why do you need joinedServers, isn't nextServers defined above your code 
sufficient ?

[~michim], can you take a look as well ? do you know if the same address:port 
can be shared by different servers ? (please see comment above by [~Yasuhito 
Fukuda])

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Alexander Shraer (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558594#comment-14558594
 ] 

Alexander Shraer commented on ZOOKEEPER-2172:
-

Michi, I agree that its weird, and again I see this 0x round 
number which I think causes the other server to ignore the leader's messages 
and continue looking even though server 1 is leading (without timing out for 
some reason).  This looks related to ZOOKEEPER-1732 and ZOOKEEPER-1805. [~fpj] 
could you take a look ?

It also seems like there are a lot of client sessions being established and 
destroyed (clients connect and disconnect).
And in particular when the reconfig adding server 3 happens (12:33:21,797 on 
server 1) the client session 0x14d8b08424a0014 (this is the client that 
submitted the reconfig) gets closed in the middle of the operation. Then, the 
connection to server 2 is suddenly closed with the error (on server 2)

2015-05-25 12:33:25,786 [myid:2] - WARN  
[QuorumPeer[myid=2]/10.0.0.2:1300:Follower@92] - Exception when following the 
leader java.net.SocketTimeoutException: Read timed out

Could it be that the termination of a client session in the middle of an op 
messes up server-to-server connections ?


 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-25 Thread Yasuhito Fukuda (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yasuhito Fukuda updated ZOOKEEPER-2193:
---
Attachment: ZOOKEEPER-2193-v2.patch

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Alexander Shraer (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558641#comment-14558641
]

Alexander Shraer commented on ZOOKEEPER-2172:
-

when you run in the slower environment does server 2 still disconnect like in
the logs when you add server 3 ?

Can you please provide your initial configs for all 3 servers ? and the
reconfig commands you run ? I'd like to try to reproduce this.

Cluster crashes when reconfig a new node as a participant
-

When I extended the waiting time between starting the third node and
reconfiguring the cluster, the problem didn’t show again. So it should be a
race condition problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Alexander Shraer (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558709#comment-14558709
]

Alexander Shraer commented on ZOOKEEPER-2172:
-

Thanks. The basic scenario you described works for me, but you're right of
course - there may be a race condition.

It would be great if you could help investigate this further. For example, why
does server 2 disconnect when you add server 3 ? BTW, server 3 doesn't even
have to be up in order for you to add it to the ensemble since you have 2 out
of 3 servers without it.

I also wonder if the problem has anything to do with the client shutting off
right after -- does it happen if you don't close the connection on the client ?

Cluster crashes when reconfig a new node as a participant
-

When I extended the waiting time between starting the third node and
reconfiguring the cluster, the problem didn’t show again. So it should be a
race condition problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-25 Thread Ziyou Wang (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558698#comment-14558698
]

Ziyou Wang commented on ZOOKEEPER-2172:
---

Yes, it has the same problem.

The step is:
1. Start server 1, its zoo.cfg contains:
standaloneEnabled=false
syncLimit=2
tickTime=2000
initLimit=5

ant its dynamic config file just contains itself.

2. Start server 2, its zoo.cfg is the same and its init dynamic config file
contains 1 and itself. Then reconfig itself as a participant after it sync data
from 1 (need to try a few time until the reconfig can be finished).

3. Start server 3, use the same zoo.cfg and its init dynamic config file
contains 1, 2 and itself. Then reconfig itself as the third participant (where
we hit the bug).

Since it is a race condition, it may be not easy to be reproduced in your
environment. In my dev, the zk server is running in a docker-in-VM
environment. If I change the script to just wait 5 secs after starting the
server 3, the problem can be reproduced every time.

Cluster crashes when reconfig a new node as a participant
-

When I extended the waiting time between starting the third node and
reconfiguring the cluster, the problem didn’t show again. So it should be a
race condition problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-25 Thread Alexander Shraer (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14558642#comment-14558642
 ] 

Alexander Shraer commented on ZOOKEEPER-2193:
-

Thanks! I'll take a closer look tomorrow, but as I mentioned, the line  
nextServers.remove(qs.id); should probably go before your changes, not after, 
otherwise we won't be able to change a server's port.

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-25 Thread Alexander Shraer (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer updated ZOOKEEPER-2193:

Assignee: Yasuhito Fukuda

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: ZOOKEEPER-2193 PreCommit Build #2711

2015-05-25 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2711/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 370849 lines...]
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 2.0.3) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2711//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2711//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2711//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] cd47fc2ed4cc3623f810450e59d5b2f10f8bf2e3 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1782:
 exec returned: 2

Total time: 8 minutes 0 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-ZOOKEEPER-Build #2708
Archived 7 artifacts
Archive block size is 32768
Received 3 blocks and 465417 bytes
Compression is 17.4%
Took 4.2 sec
Recording test results
Description set: ZOOKEEPER-2193
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
REGRESSION:  
org.apache.zookeeper.server.quorum.LearnerSnapshotThrottlerTest.testHighContentionWithTimeout

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.zookeeper.server.quorum.LearnerSnapshotThrottlerTest.testHighContentionWithTimeout(LearnerSnapshotThrottlerTest.java:211)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)

ZooKeeper_branch34_jdk7 - Build # 897 - Still Failing

ZooKeeper-trunk-solaris - Build # 1042 - Still Failing

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

ZooKeeper-trunk-openjdk7 - Build # 821 - Still Failing

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

[jira] [Updated] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

[jira] [Updated] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

Failed: ZOOKEEPER-2193 PreCommit Build #2711

15 matches

Site Navigation

Mail list logo

Footer information