[jira] [Commented] (ZOOKEEPER-2170) Zookeeper is not logging as per the configuraiton in log4j.properties
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602484#comment-14602484 ] Arshad Mohammad commented on ZOOKEEPER-2170: Lets me further clarify what are the problems with the current default configuration as per my understanding Suppose I downloaded latest zookeeper, installed and run it without any configuration change Log will go to console logger and console is redirected to file zookeeper-root-server-host-name.out {color:red}Problem 1:{color} This file keeps on growing, if zookeeper runs many number of days or logging frequency is high(in case errors), this file will grow in GBs, big enough that it can not be open {color:red}Problem 2:{color} When I restart the zookeeper, it will overwrite the previous zookeeper-root-server-host-name.out file and create new file, all the log history is gone {color:red}Problem 3:{color} after observing Problem 1 and Problem 2 any user would go and modify the log4.properties but it would not do any effect, as I explained in my earlier comments you are right [~rgs], [~cnauroth] 's patch in a way to align with hadoop's configuration. But it is different form what is the expectation of this JIRA. May be [~cnauroth] 's patch can be committed as part of different JIRA Expectation of this JIRA is: 1) Default logging behaviour should come from log4.properties 2) It is good if we can make ROLLINGFILE as the default logger Zookeeper is not logging as per the configuraiton in log4j.properties -- Key: ZOOKEEPER-2170 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2170 Project: ZooKeeper Issue Type: Bug Reporter: Arshad Mohammad Assignee: Chris Nauroth Fix For: 3.6.0 Attachments: ZOOKEEPER-2170.001.patch In conf/log4j.properties default root logger is {code} zookeeper.root.logger=INFO, CONSOLE {code} Changing root logger to bellow value or any other value does not change logging effect {code} zookeeper.root.logger=DEBUG, ROLLINGFILE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2222) Fail fast if `myid` does not exist but server.N properties are defined
Joe Halliwell created ZOOKEEPER-: Summary: Fail fast if `myid` does not exist but server.N properties are defined Key: ZOOKEEPER- URL: https://issues.apache.org/jira/browse/ZOOKEEPER- Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6 Reporter: Joe Halliwell Priority: Minor Under these circumstances the server logs a warning, but starts in standalone mode. I think it should exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2140) NettyServerCnxn and NIOServerCnxn code should be improved
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602851#comment-14602851 ] Arshad Mohammad commented on ZOOKEEPER-2140: Rebased the patch on top of trunk NettyServerCnxn and NIOServerCnxn code should be improved - Key: ZOOKEEPER-2140 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2140 Project: ZooKeeper Issue Type: Improvement Reporter: Arshad Mohammad Fix For: 3.6.0 Attachments: ZOOKEEPER-2140-1.patch, ZOOKEEPER-2140-2.patch, ZOOKEEPER-2140-3.patch Classes org.apache.zookeeper.server.NIOServerCnxn and org.apache.zookeeper.server.NettyServerCnxn have following need and scope for improvement 1) Duplicate code. These two classes have around 250 line duplicate code. All the command code is duplicated 2) Many improvement/bugFix done in one class but not done in other class. These changes should be synced For example In NettyServerCnxn {code} // clone should be faster than iteration // ie give up the cnxns lock faster AbstractSetServerCnxn cnxns; synchronized (factory.cnxns) { cnxns = new HashSetServerCnxn(factory.cnxns); } for (ServerCnxn c : cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} In NIOServerCnxn {code} for (ServerCnxn c : factory.cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} 3) NettyServerCnxn and NIOServerCnxn classes are bulky unnecessarily. Command classes have altogether different functionality, the command classes should go in different class files. If this done it will be easy to add new command with minimal change to existing classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2222) Fail fast if `myid` does not exist but server.N properties are defined
[ https://issues.apache.org/jira/browse/ZOOKEEPER-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603153#comment-14603153 ] Joe Halliwell commented on ZOOKEEPER-: -- Looking through the code, it's clearly supposed to exit under these circumstances. I'll see if I can provide more details. Fail fast if `myid` does not exist but server.N properties are defined -- Key: ZOOKEEPER- URL: https://issues.apache.org/jira/browse/ZOOKEEPER- Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6 Reporter: Joe Halliwell Priority: Minor Under these circumstances the server logs a warning, but starts in standalone mode. I think it should exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2140) NettyServerCnxn and NIOServerCnxn code should be improved
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602967#comment-14602967 ] Hadoop QA commented on ZOOKEEPER-2140: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12742114/ZOOKEEPER-2140-3.patch against trunk revision 1686767. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2780//console This message is automatically generated. NettyServerCnxn and NIOServerCnxn code should be improved - Key: ZOOKEEPER-2140 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2140 Project: ZooKeeper Issue Type: Improvement Reporter: Arshad Mohammad Fix For: 3.6.0 Attachments: ZOOKEEPER-2140-1.patch, ZOOKEEPER-2140-2.patch, ZOOKEEPER-2140-3.patch Classes org.apache.zookeeper.server.NIOServerCnxn and org.apache.zookeeper.server.NettyServerCnxn have following need and scope for improvement 1) Duplicate code. These two classes have around 250 line duplicate code. All the command code is duplicated 2) Many improvement/bugFix done in one class but not done in other class. These changes should be synced For example In NettyServerCnxn {code} // clone should be faster than iteration // ie give up the cnxns lock faster AbstractSetServerCnxn cnxns; synchronized (factory.cnxns) { cnxns = new HashSetServerCnxn(factory.cnxns); } for (ServerCnxn c : cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} In NIOServerCnxn {code} for (ServerCnxn c : factory.cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} 3) NettyServerCnxn and NIOServerCnxn classes are bulky unnecessarily. Command classes have altogether different functionality, the command classes should go in different class files. If this done it will be easy to add new command with minimal change to existing classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: ZOOKEEPER-2140 PreCommit Build #2780
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2140 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2780/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 185 lines...] [exec] File to patch: [exec] Skip this patch? [y] [exec] Skipping patch. [exec] 1 out of 1 hunk ignored [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12742114/ZOOKEEPER-2140-3.patch [exec] against trunk revision 1686767. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2780//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 91638864402fcf8600d1cddc0062ebbe3542b00a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1782: exec returned: 1 Total time: 44 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-ZOOKEEPER-Build #2752 Archived 1 artifacts Archive block size is 32768 Received 0 blocks and 60820 bytes Compression is 0.0% Took 7.2 sec Recording test results Description set: ZOOKEEPER-2140 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602999#comment-14602999 ] Flavio Junqueira commented on ZOOKEEPER-2193: - I remember that we wired QCM so that we could assign ids to observers automatically, but frankly I can't remember having finished the feature. I believe we still require observers to have unique ids in the config, but I can do some investigation to be sure. reconfig command completes even if parameter is wrong obviously --- Key: ZOOKEEPER-2193 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193 Project: ZooKeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.5.0 Environment: CentOS7 + Java7 Reporter: Yasuhito Fukuda Assignee: Yasuhito Fukuda Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch Even if reconfig parameter is wrong, it was confirmed to complete. refer to the following. - Ensemble consists of four nodes {noformat} [zk: vm-101:2181(CONNECTED) 0] config server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant version=1 {noformat} - add node by reconfig command {noformat} [zk: vm-101:2181(CONNECTED) 9] reconfig -add server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 Committed new configuration: server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 version=30007 {noformat} server.4 and server.5 of the IP address is a duplicate. In this state, reader election will not work properly. Besides, it is assumed an ensemble will be undesirable state. I think that need a parameter validation when reconfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ZOOKEEPER-2222) Fail fast if `myid` does not exist but server.N properties are defined
[ https://issues.apache.org/jira/browse/ZOOKEEPER-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Halliwell resolved ZOOKEEPER-. -- Resolution: Invalid My mistake -- the config the server was using did not define any server.N entries. Fail fast if `myid` does not exist but server.N properties are defined -- Key: ZOOKEEPER- URL: https://issues.apache.org/jira/browse/ZOOKEEPER- Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.6 Reporter: Joe Halliwell Priority: Minor Under these circumstances the server logs a warning, but starts in standalone mode. I think it should exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603285#comment-14603285 ] Hongchao Deng commented on ZOOKEEPER-2164: -- It's on my plan to have a patch for this. I'm currently involved in internal stuff. I should be able to get onto this after that. At the mean time, it sounds like you have a good testing plan. Would be nice if you can share it. :) fast leader election keeps failing -- Key: ZOOKEEPER-2164 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Reporter: Michi Mutsuzaki Assignee: Hongchao Deng Fix For: 3.5.2, 3.6.0 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. When I shut down 2, 1 and 3 keep going back to leader election. Here is what seems to be happening. - Both 1 and 3 elect 3 as the leader. - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a follower. - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't timeout for 5 seconds: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 - By the time 3 receives votes, 1 has given up trying to connect to 3: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
ZooKeeper-trunk-openjdk7 - Build # 854 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/854/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 364280 lines...] [junit] 2015-06-26 20:42:35,376 [myid:] - INFO [main:FileTxnSnapLog@298] - Snapshotting: 0xb to /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk-openjdk7/trunk/build/test/tmp/test1787711313594693480.junit.dir/version-2/snapshot.b [junit] 2015-06-26 20:42:35,416 [myid:] - INFO [main:FourLetterWordMain@63] - connecting to 127.0.0.1 11222 [junit] 2015-06-26 20:42:35,417 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:53903 [junit] 2015-06-26 20:42:35,431 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@836] - Processing stat command from /127.0.0.1:53903 [junit] 2015-06-26 20:42:35,432 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@685] - Stat command output [junit] 2015-06-26 20:42:35,432 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:53903 (no session established for client) [junit] 2015-06-26 20:42:35,433 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2015-06-26 20:42:35,434 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2015-06-26 20:42:35,435 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port11222,name1=InMemoryDataTree [junit] 2015-06-26 20:42:35,435 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2015-06-26 20:42:35,435 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11222 [junit] 2015-06-26 20:42:35,435 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 85691 [junit] 2015-06-26 20:42:35,436 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 24 [junit] 2015-06-26 20:42:35,436 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota [junit] 2015-06-26 20:42:35,436 [myid:] - INFO [main:ClientBase@538] - tearDown starting [junit] 2015-06-26 20:42:35,955 [myid:] - INFO [SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop! [junit] 2015-06-26 20:42:35,955 [myid:] - INFO [SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop! [junit] 2015-06-26 20:42:36,749 [myid:] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1138] - Opening socket connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to authenticate using SASL (unknown error) [junit] 2015-06-26 20:42:36,750 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:53906 [junit] 2015-06-26 20:42:36,750 [myid:] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@980] - Socket connection established, initiating session, client: /127.0.0.1:53906, server: 127.0.0.1/127.0.0.1:11222 [junit] 2015-06-26 20:42:36,764 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@936] - Client attempting to renew session 0x101603fe65b at /127.0.0.1:53906 [junit] 2015-06-26 20:42:36,765 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@645] - Established session 0x101603fe65b with negotiated timeout 3 for client /127.0.0.1:53906 [junit] 2015-06-26 20:42:36,768 [myid:] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1400] - Session establishment complete on server 127.0.0.1/127.0.0.1:11222, sessionid = 0x101603fe65b, negotiated timeout = 3 [junit] 2015-06-26 20:42:36,774 [myid:] - INFO [ProcessThread(sid:0 cport:11222)::PrepRequestProcessor@640] - Processed session termination for sessionid: 0x101603fe65b [junit] 2015-06-26 20:42:36,779 [myid:] - INFO [SyncThread:0:FileTxnLog@200] - Creating new log file: log.c [junit] 2015-06-26 20:42:36,786 [myid:] - INFO [main:ZooKeeper@1110] - Session: 0x101603fe65b closed [junit] 2015-06-26 20:42:36,786 [myid:] - INFO [main:ClientBase@508] - STOPPING server [junit] 2015-06-26 20:42:36,786 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@542] - EventThread shut down for session: 0x101603fe65b [junit] 2015-06-26 20:42:36,787 [myid:] - INFO [NIOWorkerThread-5:MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11222,name1=Connections,name2=127.0.0.1,name3=0x101603fe65b] [junit] 2015-06-26 20:42:36,788 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11222:NIOServerCnxnFactory$AcceptThread@219] - accept thread
ZooKeeper_branch34_openjdk7 - Build # 919 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/919/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 215970 lines...] [junit] 2015-06-26 10:07:29,082 [myid:] - INFO [main:JMXEnv@246] - expect:StandaloneServer_port [junit] 2015-06-26 10:07:29,082 [myid:] - INFO [main:JMXEnv@250] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2015-06-26 10:07:29,083 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2015-06-26 10:07:29,083 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@219] - NIOServerCnxn factory exited run method [junit] 2015-06-26 10:07:29,083 [myid:] - INFO [main:ZooKeeperServer@441] - shutting down [junit] 2015-06-26 10:07:29,084 [myid:] - INFO [main:SessionTrackerImpl@225] - Shutting down [junit] 2015-06-26 10:07:29,084 [myid:] - INFO [main:PrepRequestProcessor@769] - Shutting down [junit] 2015-06-26 10:07:29,084 [myid:] - INFO [main:SyncRequestProcessor@209] - Shutting down [junit] 2015-06-26 10:07:29,084 [myid:] - INFO [ProcessThread(sid:0 cport:11221)::PrepRequestProcessor@145] - PrepRequestProcessor exited loop! [junit] 2015-06-26 10:07:29,085 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@187] - SyncRequestProcessor exited! [junit] 2015-06-26 10:07:29,085 [myid:] - INFO [main:FinalRequestProcessor@415] - shutdown of request processor complete [junit] 2015-06-26 10:07:29,086 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2015-06-26 10:07:29,086 [myid:] - INFO [main:JMXEnv@146] - ensureOnly:[] [junit] 2015-06-26 10:07:29,088 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2015-06-26 10:07:29,088 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2015-06-26 10:07:29,089 [myid:] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2015-06-26 10:07:29,089 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2015-06-26 10:07:29,089 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/branch-3.4/build/test/tmp/test6680355425688285248.junit.dir/version-2 snapdir /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/branch-3.4/build/test/tmp/test6680355425688285248.junit.dir/version-2 [junit] 2015-06-26 10:07:29,094 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2015-06-26 10:07:29,095 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:38287 [junit] 2015-06-26 10:07:29,095 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing stat command from /127.0.0.1:38287 [junit] 2015-06-26 10:07:29,095 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@663] - Stat command output [junit] 2015-06-26 10:07:29,096 [myid:] - INFO [Thread-4:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:38287 (no session established for client) [junit] 2015-06-26 10:07:29,096 [myid:] - INFO [main:JMXEnv@229] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2015-06-26 10:07:29,099 [myid:] - INFO [main:JMXEnv@246] - expect:InMemoryDataTree [junit] 2015-06-26 10:07:29,099 [myid:] - INFO [main:JMXEnv@250] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree [junit] 2015-06-26 10:07:29,099 [myid:] - INFO [main:JMXEnv@246] - expect:StandaloneServer_port [junit] 2015-06-26 10:07:29,099 [myid:] - INFO [main:JMXEnv@250] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port11221 [junit] 2015-06-26 10:07:29,100 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 32639 [junit] 2015-06-26 10:07:29,100 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 20 [junit] 2015-06-26 10:07:29,100 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota [junit] 2015-06-26 10:07:29,101 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2015-06-26 10:07:29,167 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x14e2f561a9d closed [junit] 2015-06-26 10:07:29,167 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2015-06-26 10:07:29,167 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@517] - EventThread shut down for session: 0x14e2f561a9d [junit] 2015-06-26 10:07:29,168 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-2155) network is not good, the watcher in observer env will clear
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603871#comment-14603871 ] Raul Gutierrez Segales commented on ZOOKEEPER-2155: --- Unless we can get a better description of what's going on ... network is not good, the watcher in observer env will clear --- Key: ZOOKEEPER-2155 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2155 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.4.6 Reporter: linking12 Priority: Critical Labels: moreinfo Fix For: 3.5.0 When I set up a ZooKeeper ensemble that uses Observers, The network is not very good. I find all of the watcher disappear. I read the source code and find: When the observer connect to leader, will dump the DataTree from leader and rebuild in observer. But the datawachers and childWatches is cleared for this operation. after i change code like: WatchManager dataWatchers = zk.getZKDatabase().getDataTree() .getDataWatches(); WatchManager childWatchers = zk.getZKDatabase().getDataTree() .getChildWatches(); zk.getZKDatabase().clear(); zk.getZKDatabase().deserializeSnapshot(leaderIs); zk.getZKDatabase().getDataTree().setDataWatches(dataWatchers); zk.getZKDatabase().getDataTree().setChildWatches(childWatchers); The watcher do not disappear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1525) Plumb ZooKeeperServer object into auth plugins
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603875#comment-14603875 ] Hadoop QA commented on ZOOKEEPER-1525: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696698/ZOOKEEPER-1525.patch against trunk revision 1687876. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2781//console This message is automatically generated. Plumb ZooKeeperServer object into auth plugins -- Key: ZOOKEEPER-1525 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1525 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.5.0 Reporter: Warren Turkal Assignee: Warren Turkal Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-1525.patch, ZOOKEEPER-1525.patch, ZOOKEEPER-1525.patch I want to plumb the ZooKeeperServer object into the auth plugins so that I can store authentication data in zookeeper itself. With access to the ZooKeeperServer object, I also have access to the ZKDatabase and can look up entries in the local copy of the zookeeper data. In order to implement this, I make sure that a ZooKeeperServer instance is passed in to the ProviderRegistry.initialize() method. Then initialize() will try to find a constructor for the AuthenticationProvider that takes a ZooKeeperServer instance. If the constructor is found, it will be used. Otherwise, initialize() will look for a constructor that takes no arguments and use that instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: ZOOKEEPER-1525 PreCommit Build #2781
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1525 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2781/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 98 lines...] [exec] Hunk #2 FAILED at 34. [exec] Hunk #3 succeeded at 64 (offset 2 lines). [exec] 1 out of 3 hunks FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/auth/ProviderRegistry.java.rej [exec] patching file src/java/test/org/apache/zookeeper/test/KeyAuthClientTest.java [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12696698/ZOOKEEPER-1525.patch [exec] against trunk revision 1687876. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 2 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2781//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] dc734804594abcc5229fbfbcea3736c41a2b574a logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1782: exec returned: 1 Total time: 45 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-ZOOKEEPER-Build #2752 Archived 1 artifacts Archive block size is 32768 Received 0 blocks and 60820 bytes Compression is 0.0% Took 8.3 sec Recording test results Description set: ZOOKEEPER-1525 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603793#comment-14603793 ] Raul Gutierrez Segales commented on ZOOKEEPER-2193: --- Ok - thanks for checking [~shralex] and [~fpj]. I'll go ahead and merge then. reconfig command completes even if parameter is wrong obviously --- Key: ZOOKEEPER-2193 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193 Project: ZooKeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.5.0 Environment: CentOS7 + Java7 Reporter: Yasuhito Fukuda Assignee: Yasuhito Fukuda Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch Even if reconfig parameter is wrong, it was confirmed to complete. refer to the following. - Ensemble consists of four nodes {noformat} [zk: vm-101:2181(CONNECTED) 0] config server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant version=1 {noformat} - add node by reconfig command {noformat} [zk: vm-101:2181(CONNECTED) 9] reconfig -add server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 Committed new configuration: server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 version=30007 {noformat} server.4 and server.5 of the IP address is a duplicate. In this state, reader election will not work properly. Besides, it is assumed an ensemble will be undesirable state. I think that need a parameter validation when reconfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603795#comment-14603795 ] Raul Gutierrez Segales commented on ZOOKEEPER-2193: --- I think it's been clarified below by [~fpj] and [~shralex] - we can ignore observers using non-unique (i.e.: -1) ids for now. I'll go ahead and merge. Thanks [~Yasuhito Fukuda]! reconfig command completes even if parameter is wrong obviously --- Key: ZOOKEEPER-2193 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193 Project: ZooKeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.5.0 Environment: CentOS7 + Java7 Reporter: Yasuhito Fukuda Assignee: Yasuhito Fukuda Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch Even if reconfig parameter is wrong, it was confirmed to complete. refer to the following. - Ensemble consists of four nodes {noformat} [zk: vm-101:2181(CONNECTED) 0] config server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant version=1 {noformat} - add node by reconfig command {noformat} [zk: vm-101:2181(CONNECTED) 9] reconfig -add server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 Committed new configuration: server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 version=30007 {noformat} server.4 and server.5 of the IP address is a duplicate. In this state, reader election will not work properly. Besides, it is assumed an ensemble will be undesirable state. I think that need a parameter validation when reconfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2140) NettyServerCnxn and NIOServerCnxn code should be improved
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603823#comment-14603823 ] Raul Gutierrez Segales commented on ZOOKEEPER-2140: --- [~arshad.mohammad]: if you are using git, you need to create the patch with: {code} git diff --no-prefix HEAD~1.. ZOOKEEPER-2140.patch {code} Otherwise, jenkins won't be able to apply it and run the tests. Please recreate the patch and upload it again, thanks! NettyServerCnxn and NIOServerCnxn code should be improved - Key: ZOOKEEPER-2140 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2140 Project: ZooKeeper Issue Type: Improvement Reporter: Arshad Mohammad Fix For: 3.6.0 Attachments: ZOOKEEPER-2140-1.patch, ZOOKEEPER-2140-2.patch, ZOOKEEPER-2140-3.patch Classes org.apache.zookeeper.server.NIOServerCnxn and org.apache.zookeeper.server.NettyServerCnxn have following need and scope for improvement 1) Duplicate code. These two classes have around 250 line duplicate code. All the command code is duplicated 2) Many improvement/bugFix done in one class but not done in other class. These changes should be synced For example In NettyServerCnxn {code} // clone should be faster than iteration // ie give up the cnxns lock faster AbstractSetServerCnxn cnxns; synchronized (factory.cnxns) { cnxns = new HashSetServerCnxn(factory.cnxns); } for (ServerCnxn c : cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} In NIOServerCnxn {code} for (ServerCnxn c : factory.cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} 3) NettyServerCnxn and NIOServerCnxn classes are bulky unnecessarily. Command classes have altogether different functionality, the command classes should go in different class files. If this done it will be easy to add new command with minimal change to existing classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1525) Plumb ZooKeeperServer object into auth plugins
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603822#comment-14603822 ] Raul Gutierrez Segales commented on ZOOKEEPER-1525: --- Thanks for the patch [~timrc]! I added a few comments in the RB. After updating that, mind re-attaching the patch here as well so CI can run too. Thanks! Plumb ZooKeeperServer object into auth plugins -- Key: ZOOKEEPER-1525 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1525 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.5.0 Reporter: Warren Turkal Assignee: Warren Turkal Fix For: 3.5.2, 3.6.0 Attachments: ZOOKEEPER-1525.patch, ZOOKEEPER-1525.patch, ZOOKEEPER-1525.patch I want to plumb the ZooKeeperServer object into the auth plugins so that I can store authentication data in zookeeper itself. With access to the ZooKeeperServer object, I also have access to the ZKDatabase and can look up entries in the local copy of the zookeeper data. In order to implement this, I make sure that a ZooKeeperServer instance is passed in to the ProviderRegistry.initialize() method. Then initialize() will try to find a constructor for the AuthenticationProvider that takes a ZooKeeperServer instance. If the constructor is found, it will be used. Otherwise, initialize() will look for a constructor that takes no arguments and use that instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2170) Zookeeper is not logging as per the configuraiton in log4j.properties
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603828#comment-14603828 ] Raul Gutierrez Segales commented on ZOOKEEPER-2170: --- Sounds sensible to me, could you provide a patch for that? Zookeeper is not logging as per the configuraiton in log4j.properties -- Key: ZOOKEEPER-2170 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2170 Project: ZooKeeper Issue Type: Bug Reporter: Arshad Mohammad Assignee: Chris Nauroth Fix For: 3.6.0 Attachments: ZOOKEEPER-2170.001.patch In conf/log4j.properties default root logger is {code} zookeeper.root.logger=INFO, CONSOLE {code} Changing root logger to bellow value or any other value does not change logging effect {code} zookeeper.root.logger=DEBUG, ROLLINGFILE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2217) event might lost before re-watch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603844#comment-14603844 ] Raul Gutierrez Segales commented on ZOOKEEPER-2217: --- getChildren (and getData, etc) do get the data set their watches atomically, so how would inverting the order change anything? The only way of getting _every_ intermediate state would be by tailing the transaction logs, but at that point maybe ZooKeeper is not the right tool for the job. event might lost before re-watch Key: ZOOKEEPER-2217 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2217 Project: ZooKeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.4.5, 3.4.6 Environment: jdk1.7_45 on centos6.5 and ubuntu14.4 Reporter: Caspian I use zk to monitor the children nodes under a path, eg: /servers. when the client is told that children changes, I have to re-watch the path again, during the peroid, it's possible that some children down, or some up. And those events will be missed. For now, my temporary solution is not to use getChildren(path, true...) to get children and re-watch this path, but re-watch this path first, then get the children. Thus non events can be ignored, but I don't know what will the zk server be like if there are too much clients that act like this. How do you think of this problem? Is there any other solutions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603650#comment-14603650 ] Flavio Junqueira commented on ZOOKEEPER-2193: - We have added code to QCM so that observers could connect without providing a unique id, but I don't think the way we process configuration supports this feature currently. For example, we check if a peer is observing by checking the server id. The intent was to support this feature, though, to be able connect observers without having to change configuration. reconfig command completes even if parameter is wrong obviously --- Key: ZOOKEEPER-2193 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193 Project: ZooKeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.5.0 Environment: CentOS7 + Java7 Reporter: Yasuhito Fukuda Assignee: Yasuhito Fukuda Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch Even if reconfig parameter is wrong, it was confirmed to complete. refer to the following. - Ensemble consists of four nodes {noformat} [zk: vm-101:2181(CONNECTED) 0] config server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant version=1 {noformat} - add node by reconfig command {noformat} [zk: vm-101:2181(CONNECTED) 9] reconfig -add server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 Committed new configuration: server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 version=30007 {noformat} server.4 and server.5 of the IP address is a duplicate. In this state, reader election will not work properly. Besides, it is assumed an ensemble will be undesirable state. I think that need a parameter validation when reconfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-175) needed: docs for ops - how to setup acls authentication in the server
[ https://issues.apache.org/jira/browse/ZOOKEEPER-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603675#comment-14603675 ] Albert Taylor commented on ZOOKEEPER-175: - This looks like a pretty good start. https://ihong5.wordpress.com/2014/07/24/apache-zookeeper-setting-acl-in-zookeeper-client/ needed: docs for ops - how to setup acls authentication in the server --- Key: ZOOKEEPER-175 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-175 Project: ZooKeeper Issue Type: Improvement Components: documentation Reporter: Robbie Scott Part of the interest in creating documentation related to security. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2221) Zookeeper JettyAdminServer server should start on configured IP.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603851#comment-14603851 ] Raul Gutierrez Segales commented on ZOOKEEPER-2221: --- Thanks for the patch [~surendrasingh]! A few comments: * the indentation seems off because of tabs, could you please use spaces for indentation to make it consistent with the rest of the file? * could you document the new property (zookeeper.admin.address) in zookeeperAdmin.html? Thanks! Zookeeper JettyAdminServer server should start on configured IP. Key: ZOOKEEPER-2221 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2221 Project: ZooKeeper Issue Type: Bug Components: quorum Affects Versions: 3.5.0 Reporter: Surendra Singh Lilhore Assignee: Surendra Singh Lilhore Attachments: ZOOKEEPER-2221.patch Currently JettyAdminServer starting on 0.0.0.0 IP. 0.0.0.0 means all IP addresses on the local machine. So, if your webserver machine has two ip addresses, 192.168.1.1(private) and 10.1.2.1(public), and you allow a webserver daemon like apache to listen on 0.0.0.0, it will be reachable at both of those IPs. This is security issue. webserver should be accessible from only configured IP -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603689#comment-14603689 ] Alexander Shraer commented on ZOOKEEPER-2193: - I'm not sure it would work with reconfig either. In any case, this should be discussed in a separate Jira (in case this is important to support) reconfig command completes even if parameter is wrong obviously --- Key: ZOOKEEPER-2193 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193 Project: ZooKeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.5.0 Environment: CentOS7 + Java7 Reporter: Yasuhito Fukuda Assignee: Yasuhito Fukuda Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, ZOOKEEPER-2193-v4.patch, ZOOKEEPER-2193-v5.patch, ZOOKEEPER-2193-v6.patch, ZOOKEEPER-2193-v7.patch, ZOOKEEPER-2193-v8.patch, ZOOKEEPER-2193.patch Even if reconfig parameter is wrong, it was confirmed to complete. refer to the following. - Ensemble consists of four nodes {noformat} [zk: vm-101:2181(CONNECTED) 0] config server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant version=1 {noformat} - add node by reconfig command {noformat} [zk: vm-101:2181(CONNECTED) 9] reconfig -add server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 Committed new configuration: server.1=192.168.100.101:2888:3888:participant server.2=192.168.100.102:2888:3888:participant server.3=192.168.100.103:2888:3888:participant server.4=192.168.100.104:2888:3888:participant server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181 version=30007 {noformat} server.4 and server.5 of the IP address is a duplicate. In this state, reader election will not work properly. Besides, it is assumed an ensemble will be undesirable state. I think that need a parameter validation when reconfig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2217) event might lost before re-watch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603922#comment-14603922 ] Camille Fournier commented on ZOOKEEPER-2217: - [~caspian] I am closing this jira because this was a fundamental design decision of the system and there seems to be some confusion about the intended usage and behavior. We're happy to discuss this in more depth on the users or dev mailing lists if you are interested in feedback on what you are trying to do. Thanks! event might lost before re-watch Key: ZOOKEEPER-2217 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2217 Project: ZooKeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.4.5, 3.4.6 Environment: jdk1.7_45 on centos6.5 and ubuntu14.4 Reporter: Caspian I use zk to monitor the children nodes under a path, eg: /servers. when the client is told that children changes, I have to re-watch the path again, during the peroid, it's possible that some children down, or some up. And those events will be missed. For now, my temporary solution is not to use getChildren(path, true...) to get children and re-watch this path, but re-watch this path first, then get the children. Thus non events can be ignored, but I don't know what will the zk server be like if there are too much clients that act like this. How do you think of this problem? Is there any other solutions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ZOOKEEPER-2217) event might lost before re-watch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Camille Fournier resolved ZOOKEEPER-2217. - Resolution: Not A Problem event might lost before re-watch Key: ZOOKEEPER-2217 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2217 Project: ZooKeeper Issue Type: Improvement Components: c client, java client Affects Versions: 3.4.5, 3.4.6 Environment: jdk1.7_45 on centos6.5 and ubuntu14.4 Reporter: Caspian I use zk to monitor the children nodes under a path, eg: /servers. when the client is told that children changes, I have to re-watch the path again, during the peroid, it's possible that some children down, or some up. And those events will be missed. For now, my temporary solution is not to use getChildren(path, true...) to get children and re-watch this path, but re-watch this path first, then get the children. Thus non events can be ignored, but I don't know what will the zk server be like if there are too much clients that act like this. How do you think of this problem? Is there any other solutions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602661#comment-14602661 ] Akihiro Suda commented on ZOOKEEPER-2172: - Yes, as many logs as possible might be helpful. Plus some additional information such as the accurate ZK version, workload scripts, or filesystem information might be also helpful. I am trying to reproduce the bug by injecting some {{Thread.sleep()}}s into syncing-related functions using byteman. But I could not reproduced the bug at this moment, as I am not sure which function should be injected. Cluster crashes when reconfig a new node as a participant - Key: ZOOKEEPER-2172 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.0 Environment: Ubuntu 12.04 + java 7 Reporter: Ziyou Wang Priority: Critical Attachments: node-1.log, node-2.log, node-3.log, zoo-1.log, zoo-2-1.log, zoo-2-2.log, zoo-2-3.log, zoo-2.log, zoo-2212-1.log, zoo-2212-2.log, zoo-2212-3.log, zoo-3-1.log, zoo-3-2.log, zoo-3-3.log, zoo-3.log, zoo-4-1.log, zoo-4-2.log, zoo-4-3.log, zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, zookeeper-2.log, zookeeper-3.log The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover. I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE /10.0.0.2:55890 ”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done. When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.1005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes. When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2140) NettyServerCnxn and NIOServerCnxn code should be improved
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arshad Mohammad updated ZOOKEEPER-2140: --- Attachment: ZOOKEEPER-2140-3.patch NettyServerCnxn and NIOServerCnxn code should be improved - Key: ZOOKEEPER-2140 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2140 Project: ZooKeeper Issue Type: Improvement Reporter: Arshad Mohammad Fix For: 3.6.0 Attachments: ZOOKEEPER-2140-1.patch, ZOOKEEPER-2140-2.patch, ZOOKEEPER-2140-3.patch Classes org.apache.zookeeper.server.NIOServerCnxn and org.apache.zookeeper.server.NettyServerCnxn have following need and scope for improvement 1) Duplicate code. These two classes have around 250 line duplicate code. All the command code is duplicated 2) Many improvement/bugFix done in one class but not done in other class. These changes should be synced For example In NettyServerCnxn {code} // clone should be faster than iteration // ie give up the cnxns lock faster AbstractSetServerCnxn cnxns; synchronized (factory.cnxns) { cnxns = new HashSetServerCnxn(factory.cnxns); } for (ServerCnxn c : cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} In NIOServerCnxn {code} for (ServerCnxn c : factory.cnxns) { c.dumpConnectionInfo(pw, false); pw.println(); } {code} 3) NettyServerCnxn and NIOServerCnxn classes are bulky unnecessarily. Command classes have altogether different functionality, the command classes should go in different class files. If this done it will be easy to add new command with minimal change to existing classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2164) fast leader election keeps failing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602876#comment-14602876 ] Filip Deleersnijder commented on ZOOKEEPER-2164: We experienced a related problem. In a test-setup with 6 servers (3.4.6) with 2 servers shut down, leader election could take a very long time ( 1 to 2 minutes ) to complete. Once we changed the cnxTO variable from 5000ms to 500ms in the QuorumCnxManager, it completed under 10 seconds again. In a setup with 8 servers (3.4.6) with 2 servers shut down, leader election could take a very long time ( We have experienced more than 10 minutes ! ) to complete and frequently started again immediately after completing. Monday we will test our cnxTO fix on this setup as well. fast leader election keeps failing -- Key: ZOOKEEPER-2164 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2164 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.5 Reporter: Michi Mutsuzaki Assignee: Hongchao Deng Fix For: 3.5.2, 3.6.0 I have a 3-node cluster with sids 1, 2 and 3. Originally 2 is the leader. When I shut down 2, 1 and 3 keep going back to leader election. Here is what seems to be happening. - Both 1 and 3 elect 3 as the leader. - 1 receives votes from 3 and itself, and starts trying to connect to 3 as a follower. - 3 doesn't receive votes for 5 seconds because connectOne() to 2 doesn't timeout for 5 seconds: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java#L346 - By the time 3 receives votes, 1 has given up trying to connect to 3: https://github.com/apache/zookeeper/blob/41c9fcb3ca09cd3d05e59fe47f08ecf0b85532c8/src/java/main/org/apache/zookeeper/server/quorum/Learner.java#L247 I'm using 3.4.5, but it looks like this part of the code hasn't changed for a while, so I'm guessing later versions have the same issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)