ZooKeeper-trunk-solaris - Build # 898 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-solaris/898/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by timer Building remotely on solaris1 (Solaris) in workspace /export/home/hudson/hudson-slave/workspace/ZooKeeper-trunk-solaris FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:739) at hudson.EnvVars.getRemote(EnvVars.java:404) at hudson.model.Computer.getEnvironment(Computer.java:912) at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29) at hudson.model.Run.getEnvironment(Run.java:2221) at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:874) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:866) at hudson.model.AbstractProject.checkout(AbstractProject.java:1251) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:513) at hudson.model.Run.execute(Run.java:1706) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:231) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:802) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2328) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2797) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802) at java.io.ObjectInputStream.init(ObjectInputStream.java:299) at hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) ### ## FAILED TESTS (if any) ## No tests ran.
ZooKeeper-3.4-WinVS2008_java - Build # 492 - Failure
See https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/492/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 188733 lines...] [junit] 2014-05-20 10:30:26,834 [myid:] - INFO [main:ClientBase@443] - STARTING server [junit] 2014-05-20 10:30:26,834 [myid:] - INFO [main:ClientBase@364] - CREATING server instance 127.0.0.1:11221 [junit] 2014-05-20 10:30:26,835 [myid:] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-05-20 10:30:26,836 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-05-20 10:30:26,836 [myid:] - INFO [main:ZooKeeperServer@162] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1982126372498440347.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-3.4-WinVS2008_java\branch-3.4\build\test\tmp\test1982126372498440347.junit.dir\version-2 [junit] 2014-05-20 10:30:26,839 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-05-20 10:30:26,839 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:51465 [junit] 2014-05-20 10:30:26,840 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@827] - Processing stat command from /127.0.0.1:51465 [junit] 2014-05-20 10:30:26,840 [myid:] - INFO [Thread-4:NIOServerCnxn$StatCommand@663] - Stat command output [junit] 2014-05-20 10:30:26,840 [myid:] - INFO [Thread-4:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:51465 (no session established for client) [junit] 2014-05-20 10:30:26,841 [myid:] - INFO [main:JMXEnv@229] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-05-20 10:30:26,842 [myid:] - INFO [main:JMXEnv@246] - expect:InMemoryDataTree [junit] 2014-05-20 10:30:26,842 [myid:] - INFO [main:JMXEnv@250] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2014-05-20 10:30:26,842 [myid:] - INFO [main:JMXEnv@246] - expect:StandaloneServer_port [junit] 2014-05-20 10:30:26,843 [myid:] - INFO [main:JMXEnv@250] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2014-05-20 10:30:26,843 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 9305 [junit] 2014-05-20 10:30:26,843 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 21 [junit] 2014-05-20 10:30:26,843 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-05-20 10:30:26,843 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-05-20 10:30:27,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! [junit] 2014-05-20 10:30:27,000 [myid:] - INFO [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! [junit] 2014-05-20 10:30:27,123 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@852] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2014-05-20 10:30:27,123 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:51446 [junit] 2014-05-20 10:30:27,123 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@861] - Client attempting to renew session 0x14618f7b1ac at /127.0.0.1:51446 [junit] 2014-05-20 10:30:27,124 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:ZooKeeperServer@617] - Established session 0x14618f7b1ac with negotiated timeout 3 for client /127.0.0.1:51446 [junit] 2014-05-20 10:30:27,124 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1235] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x14618f7b1ac, negotiated timeout = 3 [junit] 2014-05-20 10:30:27,125 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14618f7b1ac [junit] 2014-05-20 10:30:27,125 [myid:] - INFO [SyncThread:0:FileTxnLog@199] - Creating new log file: log.c [junit] 2014-05-20 10:30:27,128 [myid:] - INFO [main:ZooKeeper@684] - Session: 0x14618f7b1ac closed [junit] 2014-05-20 10:30:27,128 [myid:] - INFO [main:ClientBase@490] - STOPPING server [junit] 2014-05-20 10:30:27,128 [myid:] - INFO [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down [junit] 2014-05-20 10:30:27,128 [myid:] -
[jira] [Commented] (ZOOKEEPER-1576) Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003010#comment-14003010 ] Hadoop QA commented on ZOOKEEPER-1576: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645651/ZOOKEEPER-1576.patch against trunk revision 1595561. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104//console This message is automatically generated. Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException Key: ZOOKEEPER-1576 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1576 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Environment: Three 3.4.3 zookeeper servers in cluster, linux. Reporter: Tally Tsabary Assignee: Edward Ribeiro Fix For: 3.5.0 Attachments: ZOOKEEPER-1576-3.4.patch, ZOOKEEPER-1576.3.patch, ZOOKEEPER-1576.4.patch, ZOOKEEPER-1576.5.patch, ZOOKEEPER-1576.patch Using a cluster of three 3.4.3 zookeeper servers. All the servers are up, but on the client machine, the firewall is blocking one of the servers. The following exception is happening, and the client is not connected to any of the other cluster members. The exception:Nov 02, 2012 9:54:32 PM com.netflix.curator.framework.imps.CuratorFrameworkImpl logError SEVERE: Background exception was not retry-able or retry gave up java.net.UnknownHostException: scnrmq003.myworkday.com at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source) at java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.net.InetAddress.getAllByName0(Unknown Source) at java.net.InetAddress.getAllByName(Unknown Source) at java.net.InetAddress.getAllByName(Unknown Source) at org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:440) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:375) The code at the org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60) is : public StaticHostProvider(CollectionInetSocketAddress serverAddresses) throws UnknownHostException { for (InetSocketAddress address : serverAddresses) { InetAddress resolvedAddresses[] = InetAddress.getAllByName(address .getHostName()); for (InetAddress resolvedAddress : resolvedAddresses) { this.serverAddresses.add(new InetSocketAddress(resolvedAddress .getHostAddress(), address.getPort())); } } .. The for-loop is not trying to resolve the rest of the servers on the list if there is an UnknownHostException at the InetAddress.getAllByName(address.getHostName()); and it fails the client connection creation. I was expecting the connection will be created for the other members of the cluster. Also, InetAddress is a blocking command, and if it takes very long time, (longer than the defined timeout) - that also should allow us to continue to try and connect to the other servers on the list. Assuming this will be fixed, and we will get connection to the current available servers, I think the zookeeper should continue to retry to connect to the not-connected server of the cluster, so it will be able to use it later when it is back. If one of the servers on the list is not available during the connection creation, then it should be retried every x time despite the fact that we -- This message was sent by Atlassian JIRA (v6.2#6252)
Failed: ZOOKEEPER-1576 PreCommit Build #2104
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1576 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 238462 lines...] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12645651/ZOOKEEPER-1576.patch [exec] against trunk revision 1595561. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2104//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 409b4acd7120f3f3994d6191119a983cc5acee7e logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1696: exec returned: 1 Total time: 39 minutes 12 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1576 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003013#comment-14003013 ] Grzegorz Grzybek commented on ZOOKEEPER-1459: - Is this really fixed? I don't see the change here: http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java?view=markup... regards Grzegorz Grzybek Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Sub-task Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
ZooKeeper-trunk-WinVS2008_java - Build # 736 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008_java/736/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 267892 lines...] [junit] 2014-05-20 10:52:01,237 [myid:] - INFO [main:NIOServerCnxnFactory@683] - binding to port 0.0.0.0/0.0.0.0:11221 [junit] 2014-05-20 10:52:01,238 [myid:] - INFO [main:ClientBase@339] - STARTING server instance 127.0.0.1:11221 [junit] 2014-05-20 10:52:01,238 [myid:] - INFO [main:ZooKeeperServer@766] - minSessionTimeout set to 6000 [junit] 2014-05-20 10:52:01,238 [myid:] - INFO [main:ZooKeeperServer@775] - maxSessionTimeout set to 6 [junit] 2014-05-20 10:52:01,238 [myid:] - INFO [main:ZooKeeperServer@149] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8389610335436682810.junit.dir\version-2 snapdir f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8389610335436682810.junit.dir\version-2 [junit] 2014-05-20 10:52:01,240 [myid:] - INFO [main:FileSnap@83] - Reading snapshot f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8389610335436682810.junit.dir\version-2\snapshot.b [junit] 2014-05-20 10:52:01,241 [myid:] - INFO [main:FileTxnSnapLog@298] - Snapshotting: 0xb to f:\hudson\hudson-slave\workspace\ZooKeeper-trunk-WinVS2008_java\trunk\build\test\tmp\test8389610335436682810.junit.dir\version-2\snapshot.b [junit] 2014-05-20 10:52:01,243 [myid:] - INFO [main:FourLetterWordMain@43] - connecting to 127.0.0.1 11221 [junit] 2014-05-20 10:52:01,244 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:57398 [junit] 2014-05-20 10:52:01,245 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@835] - Processing stat command from /127.0.0.1:57398 [junit] 2014-05-20 10:52:01,245 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn$StatCommand@684] - Stat command output [junit] 2014-05-20 10:52:01,246 [myid:] - INFO [NIOWorkerThread-1:NIOServerCnxn@1006] - Closed socket connection for client /127.0.0.1:57398 (no session established for client) [junit] 2014-05-20 10:52:01,246 [myid:] - INFO [main:JMXEnv@224] - ensureParent:[InMemoryDataTree, StandaloneServer_port] [junit] 2014-05-20 10:52:01,248 [myid:] - INFO [main:JMXEnv@241] - expect:InMemoryDataTree [junit] 2014-05-20 10:52:01,248 [myid:] - INFO [main:JMXEnv@245] - found:InMemoryDataTree org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree [junit] 2014-05-20 10:52:01,248 [myid:] - INFO [main:JMXEnv@241] - expect:StandaloneServer_port [junit] 2014-05-20 10:52:01,248 [myid:] - INFO [main:JMXEnv@245] - found:StandaloneServer_port org.apache.ZooKeeperService:name0=StandaloneServer_port-1 [junit] 2014-05-20 10:52:01,249 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@55] - Memory used 13339 [junit] 2014-05-20 10:52:01,249 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@60] - Number of threads 23 [junit] 2014-05-20 10:52:01,249 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@65] - FINISHED TEST METHOD testQuota [junit] 2014-05-20 10:52:01,249 [myid:] - INFO [main:ClientBase@520] - tearDown starting [junit] 2014-05-20 10:52:01,476 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@963] - Socket connection established to 127.0.0.1/127.0.0.1:11221, initiating session [junit] 2014-05-20 10:52:01,478 [myid:] - INFO [NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:57393 [junit] 2014-05-20 10:52:01,479 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@858] - Client attempting to renew session 0x146190b66c9 at /127.0.0.1:57393 [junit] 2014-05-20 10:52:01,480 [myid:] - INFO [NIOWorkerThread-2:ZooKeeperServer@604] - Established session 0x146190b66c9 with negotiated timeout 3 for client /127.0.0.1:57393 [junit] 2014-05-20 10:52:01,480 [myid:] - INFO [main-SendThread(127.0.0.1:11221):ClientCnxn$SendThread@1346] - Session establishment complete on server 127.0.0.1/127.0.0.1:11221, sessionid = 0x146190b66c9, negotiated timeout = 3 [junit] 2014-05-20 10:52:01,481 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@685] - Processed session termination for sessionid: 0x146190b66c9 [junit] 2014-05-20 10:52:01,482 [myid:] - INFO [SyncThread:0:FileTxnLog@200] - Creating new log file: log.c [junit] 2014-05-20 10:52:01,503 [myid:] - INFO [main:ZooKeeper@968] - Session: 0x146190b66c9 closed [junit] 2014-05-20 10:52:01,503
ZooKeeper-trunk - Build # 2311 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk/2311/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 238605 lines...] [exec] Zookeeper_simpleSystem::testIPV6 : elapsed 1026 : OK [exec] Zookeeper_simpleSystem::testCreate : elapsed 1009 : OK [exec] Zookeeper_simpleSystem::testPath : elapsed 1018 : OK [exec] Zookeeper_simpleSystem::testPathValidation : elapsed 1036 : OK [exec] Zookeeper_simpleSystem::testPing : elapsed 17351 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1013 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3060 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started ZooKeeper server started : elapsed 30567 : OK [exec] Zookeeper_simpleSystem::testHangingClient : elapsed 1025 : OK [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 14870 : OK [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 15908 : OK [exec] Zookeeper_simpleSystem::testGetChildren2 : elapsed 1031 : OK [exec] Zookeeper_simpleSystem::testLastZxid : elapsed 4538 : OK [exec] Zookeeper_simpleSystem::testRemoveWatchers ZooKeeper server started : elapsed 4349 : OK [exec] Zookeeper_watchers::testDefaultSessionWatcher1 : elapsed 51 : OK [exec] Zookeeper_watchers::testDefaultSessionWatcher2 : elapsed 4 : OK [exec] Zookeeper_watchers::testObjectSessionWatcher1 : elapsed 53 : OK [exec] Zookeeper_watchers::testObjectSessionWatcher2 : elapsed 55 : OK [exec] Zookeeper_watchers::testNodeWatcher1 : assertion : elapsed 1033 [exec] Zookeeper_watchers::testChildWatcher1 : elapsed 54 : OK [exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestWatchers.cc:667: Assertion: assertion failed [Expression: ensureCondition( deliveryTracker.deliveryCounterEquals(2),1000)1000] [exec] Failures !!! [exec] Run: 71 Failure total: 1 Failures: 1 Errors: 0 [exec] FAIL: zktest-mt [exec] == [exec] 1 of 2 tests failed [exec] Please report to u...@zookeeper.apache.org [exec] == [exec] make[1]: *** [check-TESTS] Error 1 [exec] make[1]: Leaving directory `/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build/test/test-cppunit' [exec] make: *** [check-am] Error 2 BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build.xml:1426: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build.xml:1386: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/build.xml:1396: exec returned: 2 Total time: 37 minutes 21 seconds Build step 'Execute shell' marked build as failure [FINDBUGS] Skipping publisher since build result is FAILURE [WARNINGS] Skipping publisher since build result is FAILURE Archiving artifacts Recording fingerprints Updating ZOOKEEPER-657 Updating ZOOKEEPER-1891 Updating ZOOKEEPER-1864 Updating ZOOKEEPER-1895 Updating ZOOKEEPER-1214 Updating ZOOKEEPER-1797 Updating ZOOKEEPER-1923 Updating ZOOKEEPER-1836 Updating ZOOKEEPER-1791 Updating ZOOKEEPER-1062 Updating ZOOKEEPER-1926 Recording test results Publishing Javadoc Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1891) StaticHostProviderTest.testUpdateLoadBalancing times out
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003083#comment-14003083 ] Hudson commented on ZOOKEEPER-1891: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1891. StaticHostProviderTest.testUpdateLoadBalancing times out (Michi Mutsuzaki via rakeshr) (rakeshr: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593682) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/client/StaticHostProvider.java StaticHostProviderTest.testUpdateLoadBalancing times out Key: ZOOKEEPER-1891 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1891 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.5.0 Environment: ubuntu 13.10 Server environment:java.version=1.7.0_51 Server environment:java.vendor=Oracle Corporation Reporter: Michi Mutsuzaki Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: StaticHostProviderTest.log, ZOOKEEPER-1891.patch, ZOOKEEPER-1891.patch StaticHostProviderTest.testUpdateLoadBalancing is consistently timing out on my box. I'll attach a log file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1895) update all notice files, copyright, etc... with the new year - 2014
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003085#comment-14003085 ] Hudson commented on ZOOKEEPER-1895: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1895. update all notice files, copyright, etc... with the new year - 2014 (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595273) * /zookeeper/trunk/NOTICE.txt update all notice files, copyright, etc... with the new year - 2014 --- Key: ZOOKEEPER-1895 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1895 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.7, 3.5.0 Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Blocker Fix For: 3.4.7, 3.5.0 Attachments: ZOOKEEPER-1895.patch From a note on the list: Hi folks! This is a reminder to update the year in the NOTICE files from 2013 (or older) to 2014. From a legal POV this is not that important as some say. But nonetheless it's good to update the year. LieGrue, strub -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-657) Cut down the running time of ZKDatabase corruption.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003082#comment-14003082 ] Hudson commented on ZOOKEEPER-657: -- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-657. Cut down the running time of ZKDatabase corruption (Michi Mutsuzaki via rakeshr) (rakeshr: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594755) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ZkDatabaseCorruptionTest.java Cut down the running time of ZKDatabase corruption. --- Key: ZOOKEEPER-657 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-657 Project: ZooKeeper Issue Type: Improvement Components: tests Reporter: Mahadev konar Assignee: Michi Mutsuzaki Fix For: 3.4.7, 3.5.0 Attachments: ZOOKEEPER-657.patch THe zkdatabasecorruption test takes around 180 seconds right now. It just bring down a quorum cluster up and down and corrupts some snapshots. We need to investigate why it takes that long and make it shorter so that our test run times are smaller. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1864) quorumVerifier is null when creating a QuorumPeerConfig from parsing a Properties object
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003084#comment-14003084 ] Hudson commented on ZOOKEEPER-1864: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1864. quorumVerifier is null when creating a QuorumPeerConfig from parsing a Properties object (Michi Mutsuzaki via rakeshr) (rakeshr: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595443) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java quorumVerifier is null when creating a QuorumPeerConfig from parsing a Properties object Key: ZOOKEEPER-1864 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1864 Project: ZooKeeper Issue Type: Bug Components: server Reporter: some one Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: BackwardsCompatCheck.patch, ZOOKEEPER-1864.patch This bug was found when using ZK 3.5.0 with curator-test 2.3.0. curator-test is building a QuorumPeerConfig from a Properties object and then when we try to run the quorum peer using that configuration, we get an NPE: {noformat} 2014-01-19 21:58:39,768 [myid:] - ERROR [Thread-3:TestingZooKeeperServer$1@138] - From testing server (random state: false) java.lang.NullPointerException at org.apache.zookeeper.server.quorum.QuorumPeer.setQuorumVerifier(QuorumPeer.java:1320) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:156) at org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:134) at java.lang.Thread.run(Thread.java:722) {noformat} The reason that this happens is because QuorumPeerConfig:parseProperties only peforms a subset of what 'QuorumPeerConfig:parse(String path)' does. The exact additional task performed that we need in parseProperties is the dynamic config backwards compatibility check: {noformat} // backward compatibility - dynamic configuration in the same file as static configuration params // see writeDynamicConfig() - we change the config file to new format if reconfig happens if (dynamicConfigFileStr == null) { configBackwardCompatibilityMode = true; configFileStr = path; parseDynamicConfig(cfg, electionAlg, true); checkValidity(); } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1214) QuorumPeer should unregister only its previsously registered MBeans instead of use MBeanRegistry.unregisterAll() method.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003086#comment-14003086 ] Hudson commented on ZOOKEEPER-1214: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1214. QuorumPeer should unregister only its previsously registered MBeans instead of use MBeanRegistry.unregisterAll() method. (César Álvarez Núñez via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595561) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/main/org/apache/zookeeper/jmx/MBeanRegistry.java * /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumUtil.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumUtilTest.java QuorumPeer should unregister only its previsously registered MBeans instead of use MBeanRegistry.unregisterAll() method. Key: ZOOKEEPER-1214 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1214 Project: ZooKeeper Issue Type: Bug Components: quorum Reporter: César Álvarez Núñez Assignee: César Álvarez Núñez Fix For: 3.5.0 Attachments: ZOOKEEPER-1214.2.patch, ZOOKEEPER-1214.3.patch, ZOOKEEPER-1214.patch, ZOOKEEPER-1214.patch, ZOOKEEPER-1214.patch, ZOOKEEPER-1214.patch When a QuorumPeer thread dies, it is unregistering *all* ZKMBeanInfo MBeans previously registered on its java process; including those that has not been registered by itself. It does not cause any side effect in production environment where each server is running on a separate java process; but fails when using org.apache.zookeeper.test.QuorumUtil to programmatically start up a zookeeper server ensemble and use its provided methods to force Disconnected, SyncConnected or SessionExpired events; in order to perform some basic/functional testing. Scenario: * QuorumUtil qU = new QuorumUtil(1); // It creates a 3 servers ensemble. * qU.startAll(); // Startup all servers: 1 Leader + 2 Followers * qU.shutdown\(i\); // i is a number from 1 to 3. It shutdown one server. The last method causes that a QuorumPeer will die, invoking the MBeanRegistry.unregisterAll() method. As a result, *all* ZKMBeanInfo MBeans are unregistered; including those belonging to the other QuorumPeer instances. When trying to restart previous server (qU.restart\(i\)) an AssertionError is thrown at MBeanRegistry.register(ZKMBeanInfo bean, ZKMBeanInfo parent) method, causing the QuorumPeer thread dead. To solve it: * MBeanRegistry.unregisterAll() method has been removed. * QuorumPeer only unregister its ZKMBeanInfo MBeans. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1926) Unit tests should only use build/test/data for data
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003092#comment-14003092 ] Hudson commented on ZOOKEEPER-1926: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1926. Unit tests should only use build/test/data for data (Enis Soztutar via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593624) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/java/systest/org/apache/zookeeper/test/system/BaseSysTest.java * /zookeeper/trunk/src/java/systest/org/apache/zookeeper/test/system/QuorumPeerInstance.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/LearnerTest.java * /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java Unit tests should only use build/test/data for data --- Key: ZOOKEEPER-1926 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1926 Project: ZooKeeper Issue Type: Bug Components: tests Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 3.4.7, 3.5.0 Attachments: zookeeper-1926_v1-branch-3.4.patch, zookeeper-1926_v1.patch Some of the unit tests are creating temp files under system tmp dir (/tmp), and put data there. We should encapsulate all temporary data from unit tests under build/test/data. ant clean will clean all data from previous runs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1836) addrvec_next() fails to set next parameter if addrvec_hasnext() returns false
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003089#comment-14003089 ] Hudson commented on ZOOKEEPER-1836: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1836. addrvec_next() fails to set next parameter if addrvec_hasnext() returns false (Dutch T. Meyer via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595038) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/c/src/addrvec.c addrvec_next() fails to set next parameter if addrvec_hasnext() returns false - Key: ZOOKEEPER-1836 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1836 Project: ZooKeeper Issue Type: Bug Components: c client Reporter: Dutch T. Meyer Assignee: Dutch T. Meyer Priority: Trivial Fix For: 3.5.0 Attachments: ZOOKEEPER-1836.patch, ZOOKEEPER-1836.patch There is a relatively innocuous but useless pointer assignment in addrvec_next(): 195 void addrvec_next(addrvec_t *avec, struct sockaddr_storage *next) 203 if (!addrvec_hasnext(avec)) 204 { 205 next = NULL; 206 return; That assignment on (205) has no point, as next is a local variable lost upon function return. Likely this should be a memset to zero out the actual parameter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1923) A typo in zookeeperStarted document
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003088#comment-14003088 ] Hudson commented on ZOOKEEPER-1923: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1923. A typo in zookeeperStarted document (Chengwei Yang via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593428) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperStarted.xml A typo in zookeeperStarted document --- Key: ZOOKEEPER-1923 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1923 Project: ZooKeeper Issue Type: Bug Components: documentation Affects Versions: 3.4.6 Environment: The trunk branch Reporter: Chengwei Yang Assignee: Chengwei Yang Fix For: 3.5.0 Attachments: ZOOKEEPER-1923.patch There is a typo in the document zookeeperStarted.*, see http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html, in the section *Connecting to ZooKeeper*, where the *help* output *createpath* which should be *create path*. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1062) Net-ZooKeeper: Net::ZooKeeper consumes 100% cpu on wait
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003091#comment-14003091 ] Hudson commented on ZOOKEEPER-1062: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1062. Net-ZooKeeper: Net::ZooKeeper consumes 100% cpu on wait (Botond Hejj via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595374) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/src/contrib/zkperl/ZooKeeper.xs Net-ZooKeeper: Net::ZooKeeper consumes 100% cpu on wait --- Key: ZOOKEEPER-1062 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1062 Project: ZooKeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1, 3.4.5, 3.4.6 Reporter: Patrick Hunt Assignee: Botond Hejj Labels: patch Fix For: 3.4.7, 3.5.0 Attachments: ZOOKEEPER-1062.patch, ZOOKEEPER-1062.patch Reported by a user on the CDH user list (user reports that the listed fix addressed this issue for him): Net::ZooKeeper consumes 100% cpu when wait is used. At my initial inspection, it seems to be related to implementation mistake in pthread_cond_timedwait. https://rt.cpan.org/Public/Bug/Display.html?id=61290 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1791) ZooKeeper package includes unnecessary jars that are part of the package.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003090#comment-14003090 ] Hudson commented on ZOOKEEPER-1791: --- FAILURE: Integrated in ZooKeeper-trunk #2311 (See [https://builds.apache.org/job/ZooKeeper-trunk/2311/]) ZOOKEEPER-1791. ZooKeeper package includes unnecessary jars that are part of the package. (mahadev via michim) (michim: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1595559) * /zookeeper/trunk/CHANGES.txt * /zookeeper/trunk/ivy.xml * /zookeeper/trunk/src/contrib/build.xml ZooKeeper package includes unnecessary jars that are part of the package. - Key: ZOOKEEPER-1791 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1791 Project: ZooKeeper Issue Type: Bug Components: build Affects Versions: 3.5.0 Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.5.0 Attachments: ZOOKEEPER-1791.patch ZooKeeper package includes unnecessary jars that are part of the package. Packages like fatjar and {code} maven-ant-tasks-2.1.3.jar maven-artifact-2.2.1.jar maven-artifact-manager-2.2.1.jar maven-error-diagnostics-2.2.1.jar maven-model-2.2.1.jar maven-plugin-registry-2.2.1.jar maven-profile-2.2.1.jar maven-project-2.2.1.jar maven-repository-metadata-2.2.1.jar {code} are part of the zookeeper package and rpm (via bigtop). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003098#comment-14003098 ] Rakesh R commented on ZOOKEEPER-1459: - Hi [~gzres], I think you are checking wrong file. Please see the below file to understand more, https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServerMain.java Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Sub-task Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003106#comment-14003106 ] Grzegorz Grzybek commented on ZOOKEEPER-1459: - But shouldn't {{ZooKeeperServer}}'s {{shutdown}} do the same? We use {{ZooKeeperServer}} here: https://github.com/grgrzybek/fabric8/blob/master/fabric/fabric-zookeeper/src/main/java/io/fabric8/zookeeper/bootstrap/ZooKeeperServerFactory.java#L176 Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Sub-task Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1459) Standalone ZooKeeperServer is not closing the transaction log files on shutdown
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003326#comment-14003326 ] Rakesh R commented on ZOOKEEPER-1459: - This change would cause the following exception, I couldn't get the reason now. Please run the tests ReadOnlyModeTest#testReadOnlyClient and see it. {code} 2014-05-20 18:02:39,766 [myid:] - ERROR [SyncThread:1:ZooKeeperCriticalThread@47] - Severe unrecoverable error, from thread : SyncThread:1 java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88) at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:243) at org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:215) at org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:239) at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:217) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:372) at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:542) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:122) {code} I've gone through your code. Since you have handle to FileTxnSnapLog ftxn, after the server#shutdown() please close ftxn.close() in your code. Could you use ZooKeeperServerMain#initializeAndRun() and ZooKeeperServerMain#shutdown() for embedding the standalone server ? I feel ZOOKEEPER-1072 could be addressed to define the interfaces clearly to the users. Standalone ZooKeeperServer is not closing the transaction log files on shutdown --- Key: ZOOKEEPER-1459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1459 Project: ZooKeeper Issue Type: Sub-task Components: server Affects Versions: 3.4.0 Reporter: Rakesh R Assignee: Rakesh R Fix For: 3.4.6, 3.5.0 Attachments: ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459-branch-3_4.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch, ZOOKEEPER-1459.patch When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn(Error closing logs , ie); } } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1659) Add JMX support for dynamic reconfiguration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003426#comment-14003426 ] Rakesh R commented on ZOOKEEPER-1659: - After unregister if anyone(for ex: monitoring tool) queried to get the attribute value, it would occur the following exception. {code} javax.management.InstanceNotFoundException: org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=Leader,name3=InMemoryDataTree at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) {code} Add JMX support for dynamic reconfiguration --- Key: ZOOKEEPER-1659 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1659 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Rakesh R Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1659.patch We need to update JMX during reconfigurations. Currently, reconfiguration changes are not reflected in JConsole. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1659) Add JMX support for dynamic reconfiguration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003644#comment-14003644 ] Otis Gospodnetic commented on ZOOKEEPER-1659: - +1 for making sure changes are backwards-compatible. We [monitor Zookeeper with SPM|http://sematext.com/spm/] and would love to be able to use the same agent for multiple/all ZK versions instead of having ZK version-specific agents. Add JMX support for dynamic reconfiguration --- Key: ZOOKEEPER-1659 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1659 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Rakesh R Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1659.patch We need to update JMX during reconfigurations. Currently, reconfiguration changes are not reflected in JConsole. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-1621: --- Attachment: ZOOKEEPER-1621.patch ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
Survey on Project Conventions
Hello My name is Martin Brandtner [1] and I’m a software engineering researcher at the University of Zurich, Switzerland. Together with Philipp Leitner [2], I currently work on an approach to detect violations of project conventions based on data from the source code repository, the issue tracker (e.g. Jira), and the build system (e.g. Jenkins). One example for such a project convention is: “You need to make sure that the commit message contains at least the name of the contributor and ideally a reference to the Bugzilla or JIRA issue where the patch was submitted.” [3] The idea is that our approach can detect violation of such a convention automatically and therefore support the development process. First of all we need conventions and that’s why we ask you to take part in our survey. In the survey, we present five conventions and want you to rate their relevance in your Apache project. Everybody contributing to your Apache project can take part in this survey because we also want to see if different roles may have different opinions about a convention. The survey is totally anonymous and it will take about 15 minutes to answer it. We would be happy if you could fill out our survey under: http://ww3.unipark.de/uc/SEAL_Research/1abe/ before May 30, 2014. With the data collected in this survey we will implement a convention violation detection in our tool called SQA-Timeline [4]. If your are interested in our work, contact us via email or provide your email address in the survey. Best regards, Martin and Philipp [1] http://www.ifi.uzh.ch/seal/people/brandtner.html [2] http://www.ifi.uzh.ch/seal/people/leitner.html [3] http://www.apache.org/dev/committers.html#applying-patches [4] https://www.youtube.com/watch?v=ZIsOODUapAE
Review Request 21732: ZOOKEEPER-1621
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21732/ --- Review request for zookeeper. Repository: zookeeper Description --- Modify FileTxnIterator to skip a transaction log file (instead of throwing an IOException) if the header is incomplete. Diffs - http://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java 1596402 http://svn.apache.org/repos/asf/zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java 1596402 Diff: https://reviews.apache.org/r/21732/diff/ Testing --- Added 2 testcases. Thanks, michim
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003956#comment-14003956 ] Michi Mutsuzaki commented on ZOOKEEPER-1621: https://reviews.apache.org/r/21732/ ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
Failed: ZOOKEEPER-1621 PreCommit Build #2105
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 243609 lines...] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch [exec] against trunk revision 1596284. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 2a6536ea1944bd1c1c757f8d35b0e21fc3637390 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1696: exec returned: 1 Total time: 36 minutes 35 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1621 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. REGRESSION: org.apache.zookeeper.server.quorum.StandaloneDisabledTest.startSingleServerTest Error Message: client could not connect to reestablished quorum: giving up after 30+ seconds. Stack Trace: junit.framework.AssertionFailedError: client could not connect to reestablished quorum: giving up after 30+ seconds. at org.apache.zookeeper.test.ReconfigTest.testNormalOperation(ReconfigTest.java:153) at org.apache.zookeeper.server.quorum.StandaloneDisabledTest.startSingleServerTest(StandaloneDisabledTest.java:75) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003996#comment-14003996 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645856/ZOOKEEPER-1621.patch against trunk revision 1596284. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2105//console This message is automatically generated. ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
[jira] [Commented] (ZOOKEEPER-1699) Leader should timeout and give up leadership when losing quorum of last proposed configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004011#comment-14004011 ] Michi Mutsuzaki commented on ZOOKEEPER-1699: Sounds good, I'm checking this in. Leader should timeout and give up leadership when losing quorum of last proposed configuration -- Key: ZOOKEEPER-1699 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1699 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1699-draft.patch, ZOOKEEPER-1699-draft.patch, ZOOKEEPER-1699-v1.patch, ZOOKEEPER-1699-v2.patch, ZOOKEEPER-1699-v3.patch, ZOOKEEPER-1699-v4.patch, ZOOKEEPER-1699-v4.patch, ZOOKEEPER-1699-v5.patch, ZOOKEEPER-1699.patch A leader gives up leadership when losing a quorum of the current configuration. This doesn't take into account any proposed configuration. So, if a reconfig operation is in progress and a quorum of the new configuration is not responsive, the leader will just get stuck waiting for it to ACK the reconfig operation, and will never timeout. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (ZOOKEEPER-1699) Leader should timeout and give up leadership when losing quorum of last proposed configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki resolved ZOOKEEPER-1699. Resolution: Fixed trunk: http://svn.apache.org/viewvc?view=revisionrevision=1596422 Leader should timeout and give up leadership when losing quorum of last proposed configuration -- Key: ZOOKEEPER-1699 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1699 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.0 Reporter: Alexander Shraer Assignee: Alexander Shraer Priority: Blocker Fix For: 3.5.0 Attachments: ZOOKEEPER-1699-draft.patch, ZOOKEEPER-1699-draft.patch, ZOOKEEPER-1699-v1.patch, ZOOKEEPER-1699-v2.patch, ZOOKEEPER-1699-v3.patch, ZOOKEEPER-1699-v4.patch, ZOOKEEPER-1699-v4.patch, ZOOKEEPER-1699-v5.patch, ZOOKEEPER-1699.patch A leader gives up leadership when losing a quorum of the current configuration. This doesn't take into account any proposed configuration. So, if a reconfig operation is in progress and a quorum of the new configuration is not responsive, the leader will just get stuck waiting for it to ACK the reconfig operation, and will never timeout. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004109#comment-14004109 ] Alexander Shraer commented on ZOOKEEPER-1621: - Here's a different option - intuitively once zookeeper fails to write to disk, by continuing to operate normally it violates its promises to users (which is that if a majority acked the data is always there even if reboots happen). Once we realize the promise can't be kept it may be better to crash the server at that point and violate liveness (no availability) rather than to continue and risk coming up with a partial log at a later point violating safety (inconsistent state, lost transactions, etc). ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004117#comment-14004117 ] Michi Mutsuzaki commented on ZOOKEEPER-1621: I'm fine with Alex's suggestion. We should document how to manually recover when the server doesn't start because the log file doesn't contain the complete header. ZooKeeper does not recover from crash when disk was full Key: ZOOKEEPER-1621 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.3 Environment: Ubuntu 12.04, Amazon EC2 instance Reporter: David Arthur Assignee: Michi Mutsuzaki Fix For: 3.5.0 Attachments: ZOOKEEPER-1621.patch, zookeeper.log.gz The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? -- This message was sent by Atlassian JIRA (v6.2#6252)
ZooKeeper-trunk-jdk8 - Build # 24 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/24/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 1441 lines...] compile_jute: [mkdir] Created dir: /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/src/java/generated [mkdir] Created dir: /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/src/c/generated [java] ../../zookeeper.jute Parsed Successfully [java] ../../zookeeper.jute Parsed Successfully [touch] Creating /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/src/java/generated/.generated ver-gen: [javac] Compiling 1 source file to /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.5 [javac] warning: [options] source value 1.5 is obsolete and will be removed in a future release [javac] warning: [options] To suppress warnings about obsolete options, use -Xlint:-options. [javac] 3 warnings svn-revision: [mkdir] Created dir: /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/.revision version-info: process-template: build-generated: [javac] Compiling 60 source files to /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.5 [javac] warning: [options] source value 1.5 is obsolete and will be removed in a future release [javac] warning: [options] To suppress warnings about obsolete options, use -Xlint:-options. [javac] 3 warnings compile: [javac] Compiling 185 source files to /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.5 [javac] warning: [options] source value 1.5 is obsolete and will be removed in a future release [javac] warning: [options] To suppress warnings about obsolete options, use -Xlint:-options. [javac] /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java:65: error: cannot find symbol [javac] static public class Proposal extends SyncedLearnerTracker { [javac] ^ [javac] symbol: class SyncedLearnerTracker [javac] location: class Leader [javac] /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/src/java/main/org/apache/zookeeper/jmx/ManagedUtil.java:62: warning: [rawtypes] found raw type: Enumeration [javac] Enumeration enumer = r.getCurrentLoggers(); [javac] ^ [javac] missing type arguments for generic class EnumerationE [javac] where E is a type-variable: [javac] E extends Object declared in interface Enumeration [javac] /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java:69: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ BUILD FAILED /home/hudson/jenkins-slave/workspace/ZooKeeper-trunk-jdk8/trunk/build.xml:436: Compile failed; see the compiler error output for details. Total time: 10 seconds Build step 'Execute shell' marked build as failure [locks-and-latches] Releasing all the locks [locks-and-latches] All the locks released [WARNINGS] Skipping publisher since build result is FAILURE Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
Build failed in Jenkins: bookkeeper-trunk #643
See https://builds.apache.org/job/bookkeeper-trunk/643/ -- [...truncated 529 lines...] --- T E S T S --- --- T E S T S --- Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ bookkeeper-stats-api --- [INFO] Building jar: https://builds.apache.org/job/bookkeeper-trunk/ws/bookkeeper-stats/target/bookkeeper-stats-api-4.3.0-SNAPSHOT.jar [INFO] [INFO] findbugs-maven-plugin:2.5.2:check (default-cli) @ bookkeeper-stats-api [INFO] [INFO] --- findbugs-maven-plugin:2.5.2:findbugs (findbugs) @ bookkeeper-stats-api --- [INFO] Fork Value is true [INFO] Done FindBugs Analysis [INFO] [INFO] findbugs-maven-plugin:2.5.2:check (default-cli) @ bookkeeper-stats-api [INFO] [INFO] --- findbugs-maven-plugin:2.5.2:check (default-cli) @ bookkeeper-stats-api --- [INFO] BugInstance size is 0 [INFO] Error size is 0 [INFO] No errors/warnings found [INFO] [INFO] [INFO] Building bookkeeper-server 4.3.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ bookkeeper-server --- [INFO] Deleting https://builds.apache.org/job/bookkeeper-trunk/ws/bookkeeper-server (includes = [dependency-reduced-pom.xml], excludes = []) [INFO] [INFO] --- apache-rat-plugin:0.7:check (default-cli) @ bookkeeper-server --- [INFO] Exclude: **/DataFormats.java [INFO] [INFO] --- maven-remote-resources-plugin:1.1:process (default) @ bookkeeper-server --- [INFO] [INFO] --- maven-resources-plugin:2.4.3:resources (default-resources) @ bookkeeper-server --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 3 resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-compiler-plugin:3.0:compile (default-compile) @ bookkeeper-server --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 174 source files to https://builds.apache.org/job/bookkeeper-trunk/ws/bookkeeper-server/target/classes [INFO] [INFO] --- maven-resources-plugin:2.4.3:testResources (default-testResources) @ bookkeeper-server --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-compiler-plugin:3.0:testCompile (default-testCompile) @ bookkeeper-server --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 84 source files to https://builds.apache.org/job/bookkeeper-trunk/ws/bookkeeper-server/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.9:test (default-test) @ bookkeeper-server --- [INFO] Surefire report directory: https://builds.apache.org/job/bookkeeper-trunk/ws/bookkeeper-server/target/surefire-reports --- T E S T S --- --- T E S T S --- Running org.apache.bookkeeper.client.SlowBookieTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.553 sec Running org.apache.bookkeeper.client.ListLedgersTest Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.266 sec Running org.apache.bookkeeper.client.BookieRecoveryTest Tests run: 72, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.856 sec Running org.apache.bookkeeper.client.TestReadTimeout Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.688 sec Running org.apache.bookkeeper.client.LedgerRecoveryTest Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.037 sec Running org.apache.bookkeeper.client.BookKeeperTest Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.38 sec Running org.apache.bookkeeper.client.RoundRobinDistributionScheduleTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.118 sec Running org.apache.bookkeeper.client.BookKeeperCloseTest Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.355 sec Running org.apache.bookkeeper.client.TestFencing Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.005 sec Running org.apache.bookkeeper.client.TestLedgerChecker Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.378 sec Running org.apache.bookkeeper.client.TestRackawareEnsemblePlacementPolicy Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.245 sec Running org.apache.bookkeeper.client.LedgerCloseTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.381 sec Running org.apache.bookkeeper.client.TestSpeculativeRead Tests
Re: Review Request 17895: BOOKKEEPER-582: protobuf support for bookkeeper
On April 23, 2014, 10:22 a.m., Ivan Kelly wrote: bookkeeper-server/src/main/proto/BookkeeperProtocol.proto, line 33 https://reviews.apache.org/r/17895/diff/2/?file=563033#file563033line33 This should certainly not be an enum. Otherwise we need to bump the protocol version each time we add an error code. Imagine the scenario where both server and client are running 4.3.0. Then the server is upgraded with 4.3.1 which a new error EPRINTERONFIRE. It sends this to the client who throws a decode error. Sijie Guo wrote: how could being not enum help this? if it is integer, client still has no idea how to interpret, so it is still invalid response from 4.3.0 client. I thought we reached an agreement on enum on the ticket, no? Ivan Kelly wrote: So for version and operationtype, enum is ok. These originate at the client, so if the servers are always upgraded at the client, there's no interoperability issues. Status codes originate at the server though, so it is possible for the server to send a statuscode that is unrecognised to a client. The normal way to handle this would be a else or default: to pass this up to the client as a BKException.UnexpectedConditionException. If it's an enum, this will throw a decode exception in the netty decoder, which is harder to handle. To resolve this on the server side, by checking the version and only sending errors valid for that version, implies two things. Firstly, every error code change will require the version to be bumped and secondly, that there will need to be a list maintained for which errors are valid for each version. This goes against the motivation for using protobuf in the first place. Sijie Guo wrote: this is the application level agreement, no? it doesn't matter that u are using a protobuf protocol or using current protocol, or it also doesn't matter that u are using an integer or an enum. in any case, the best way is as you described, you shouldn't send new status code back to an old client, as the new status code is meaningless to the old client. Ivan Kelly wrote: but how do you know its an old client? Only by bumping the version number each time you add an error code. In which case you end up with a whole lot of junk like if (client.version == X) { send A } else if (client.version == Y) { send B } else if (client.version ... which is exactly what protobuf was designed to avoid (see A bit of history on https://developers.google.com/protocol-buffers/docs/overview). Sijie Guo wrote: a else or default branch would make the behavior unpredictable as an old client is treating a new status code as some kind of unknown. as you said, you want to treat them as UnexpectedConditionException. But what does UnexpectedConditionException means? doesn't it mean the sever already breaks backward compatibility, since server couldn't satisfy the old client's request. so still, if server wants to be backward compatibility to clients, in any cases, it needs to know what version of protocol that the client is speaking and handle them accordingly, not just let client to do their job in an unexpected way. I don't see any elegant solutions without detecting protocol version. if you have, please describe how not being enum would avoid this. Ivan Kelly wrote: the default behaviour for an unknown error code is something we already use today. https://github.com/apache/bookkeeper/blob/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java#L714 The client only needs to know that the request failed. the point of the different error codes is so that the client could take specific recovery steps. the default behaviour is just to pass the error up. Sijie Guo wrote: the default behavior was there just for all already known status codes. but it doesn't mean it is correct for any unknown status codes. and when u are saying 'the client only needs to know that the request failed', you are making an assumption that there is only one status code indicating OK, other status code should be taken as failed. but it isn't true. say in an old protocol, we supported range reads, it responded with OK, list of entry response (0 = data, 1 = missing, 2 = missing, 3 = data). if we are going to improve our protocol to make communication more efficient, we are going to change the protocol to get rid of transferring missing entries: responding with PARTIAL_OK, list of existing entries (0 = data, 3 = data). in this case, if server doesn't distinguish the client's protocol, just respond every range reads with PARTIAL_OK, would did break the compatibility with old protocol, as old protocol treats it as failure by default behavior. in order to maintain backward compatibility, server needs to detect the client's protocol and responds
Re: Review Request 17895: BOOKKEEPER-582: protobuf support for bookkeeper
On April 24, 2014, 12:19 p.m., Ivan Kelly wrote: bookkeeper-server/src/main/proto/BookkeeperProtocol.proto, line 81 https://reviews.apache.org/r/17895/diff/2/?file=563033#file563033line81 ledgerId and entryId should be optional in all requests. It may be the case, that how we specify them changes in the future (like when we flatten the metadata), so it would be good to leave that possibility open. Sijie Guo wrote: for all existing reads/writes protocol, the ledgerId and entryId are required. I am not sure how u will change the protocol by flatten the metadata. but I guess that will be pretty new protocol. if so, it would be better to add new request type, so we will not break any existing systems or making it being complicated. Ivan Kelly wrote: Im not sure how it will change either. What I'm requesting is that the protobuf protocol doesnt lock us in to how we are doing it now forever. Ivan Kelly wrote: actually for a concrete example, lets say we want to do a read request for a whole ledger, from bookie A we request all entries, but it doesnt have every 3rd entry due to striping. In the case we can request all entrys from the ledger with entryid modulo 3 from bookie B. In this case, what would I put in the _required_ entry id firld for the read to bookie B? Sijie Guo wrote: as I said, doesn't it sound like a new read protocol? Ivan Kelly wrote: it will likely go through the same codepaths though, so we'll end up with a load of duplicate code. my concern with the requiredness of fields is that it's so rigid that in future we will have to add new messages to make any enhancements, causing the protocol to grow into something huge, with loads of redundancy, and not any better than we have now with the manual encoding. Sijie Guo wrote: single entry read/write are primitives of bookie, ledger id and entry id are required for them as they are the fundamental of bookkeeper. all other improvements like streaming or range read could be built on these primitives. then, if they are built on primitives, I don't see we will end up with a lot of duplicated codes. Rakesh R wrote: As far as compatibility issue is concerned, making optional is defensive approach and safe coding by avoiding parsing issues later. But it should be done very carefully at the code level because the requiredness is now handling at the code level. For example, at the server it should do validation to see a request has both ledgerId/entryId and which is mandatory or not etc. On the otherside for the required field, I could see the entity is not open for expansion by removing that field. But we have again options like like Sijie suggested, by defining new protocol and do the expansion. If we have a better way in hand to avoid code duplication, it is OK to go ahead with required. Sijie Guo wrote: as I said, currently bookie storage is built on single read/add entry primitives. there isn't any reason to make ledger id and entry id not to be required. if you are going to change the protocol to get rid of ledger id and entry id, you have to change the bookie storage. then I don't think there will be any code duplication. I agree this for add entry but while reading 'entryId' can be optional. There is no real functional issues, only the concern is this will force us to create new protocol if any such requirement comes in future. - Rakesh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17895/#review41281 --- On April 24, 2014, 7:43 a.m., Sijie Guo wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17895/ --- (Updated April 24, 2014, 7:43 a.m.) Review request for bookkeeper and Ivan Kelly. Bugs: BOOKKEEPER-582 https://issues.apache.org/jira/browse/BOOKKEEPER-582 Repository: bookkeeper-git Description --- - introducing protobuf support for bookkeeper - for server: introduce packet processor / EnDecoder for different protocol supports - for client: change PCBC to use protobuf to send requests - misc changes for protobuf support (bookie server is able for backward compatibility) Diffs - bookkeeper-server/pom.xml ebc1198 bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/IndexInMemPageMgr.java 56487aa bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerChecker.java 28e23d6 bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingReadOp.java fb36b90 bookkeeper-server/src/main/java/org/apache/bookkeeper/processor/RequestProcessor.java 241f369
[jira] [Commented] (BOOKKEEPER-758) Add TryReadLastAddConfirmed API
[ https://issues.apache.org/jira/browse/BOOKKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003963#comment-14003963 ] Flavio Junqueira commented on BOOKKEEPER-758: - But this code does not benefit at all from the fact that the callback can be called multiple times, no? I don't think it is necessarily a big deal, but it isn't very clear the precise semantics and how applications can benefit from potentially multiple calls to the callback. Add TryReadLastAddConfirmed API --- Key: BOOKKEEPER-758 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-758 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-client Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0 Attachments: BOOKKEEPER-758.diff, BOOKKEEPER-758.v2.diff add TryReadLastConfirmed to read last confirmed without coverage checking, as for readers which polls LAC, they just need LAC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (BOOKKEEPER-751) Ensure all the bookkeeper callbacks not run under ledger handle lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004022#comment-14004022 ] Flavio Junqueira commented on BOOKKEEPER-751: - This test case failed for me, could it be related: Failed tests: test10Ledgers200ThreadsRead(org.apache.bookkeeper.test.MultipleThreadReadTest): Test failed because we couldn't read entries Ensure all the bookkeeper callbacks not run under ledger handle lock Key: BOOKKEEPER-751 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-751 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0, 4.2.3 Attachments: BOOKKEEPER-751.diff we are running bookkeeper callbacks under ledger handle lock, which would possibly introduce deadlock if application call bookkeeper functions in those callbacks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (BOOKKEEPER-758) Add TryReadLastAddConfirmed API
[ https://issues.apache.org/jira/browse/BOOKKEEPER-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004255#comment-14004255 ] Sijie Guo commented on BOOKKEEPER-758: -- to be clear, the multiple callbacks is internally in bk client, the user will only get one callback. the benefit to the code here is: when user received a LAC response from any bookie, it could move on to read the entries w/o waiting other bookies responses; so the reading entries could be paralleled with receiving LAC responses from other bookies. the benefit isn't the goal of this API, this API is to not block readLAC on waiting for LAC responses from multiple bookies. Add TryReadLastAddConfirmed API --- Key: BOOKKEEPER-758 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-758 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-client Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0 Attachments: BOOKKEEPER-758.diff, BOOKKEEPER-758.v2.diff add TryReadLastConfirmed to read last confirmed without coverage checking, as for readers which polls LAC, they just need LAC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (BOOKKEEPER-751) Ensure all the bookkeeper callbacks not run under ledger handle lock
[ https://issues.apache.org/jira/browse/BOOKKEEPER-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004259#comment-14004259 ] Sijie Guo commented on BOOKKEEPER-751: -- test10Ledgers200ThreadsRead is kind of a resource-sensitive test. it is observed failing by me too. we could improve this test case, but it isn't related to this change. Ensure all the bookkeeper callbacks not run under ledger handle lock Key: BOOKKEEPER-751 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-751 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-client Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0, 4.2.3 Attachments: BOOKKEEPER-751.diff we are running bookkeeper callbacks under ledger handle lock, which would possibly introduce deadlock if application call bookkeeper functions in those callbacks. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 17895: BOOKKEEPER-582: protobuf support for bookkeeper
On April 24, 2014, 12:19 p.m., Ivan Kelly wrote: bookkeeper-server/src/main/proto/BookkeeperProtocol.proto, line 81 https://reviews.apache.org/r/17895/diff/2/?file=563033#file563033line81 ledgerId and entryId should be optional in all requests. It may be the case, that how we specify them changes in the future (like when we flatten the metadata), so it would be good to leave that possibility open. Sijie Guo wrote: for all existing reads/writes protocol, the ledgerId and entryId are required. I am not sure how u will change the protocol by flatten the metadata. but I guess that will be pretty new protocol. if so, it would be better to add new request type, so we will not break any existing systems or making it being complicated. Ivan Kelly wrote: Im not sure how it will change either. What I'm requesting is that the protobuf protocol doesnt lock us in to how we are doing it now forever. Ivan Kelly wrote: actually for a concrete example, lets say we want to do a read request for a whole ledger, from bookie A we request all entries, but it doesnt have every 3rd entry due to striping. In the case we can request all entrys from the ledger with entryid modulo 3 from bookie B. In this case, what would I put in the _required_ entry id firld for the read to bookie B? Sijie Guo wrote: as I said, doesn't it sound like a new read protocol? Ivan Kelly wrote: it will likely go through the same codepaths though, so we'll end up with a load of duplicate code. my concern with the requiredness of fields is that it's so rigid that in future we will have to add new messages to make any enhancements, causing the protocol to grow into something huge, with loads of redundancy, and not any better than we have now with the manual encoding. Sijie Guo wrote: single entry read/write are primitives of bookie, ledger id and entry id are required for them as they are the fundamental of bookkeeper. all other improvements like streaming or range read could be built on these primitives. then, if they are built on primitives, I don't see we will end up with a lot of duplicated codes. Rakesh R wrote: As far as compatibility issue is concerned, making optional is defensive approach and safe coding by avoiding parsing issues later. But it should be done very carefully at the code level because the requiredness is now handling at the code level. For example, at the server it should do validation to see a request has both ledgerId/entryId and which is mandatory or not etc. On the otherside for the required field, I could see the entity is not open for expansion by removing that field. But we have again options like like Sijie suggested, by defining new protocol and do the expansion. If we have a better way in hand to avoid code duplication, it is OK to go ahead with required. Sijie Guo wrote: as I said, currently bookie storage is built on single read/add entry primitives. there isn't any reason to make ledger id and entry id not to be required. if you are going to change the protocol to get rid of ledger id and entry id, you have to change the bookie storage. then I don't think there will be any code duplication. Rakesh R wrote: I agree this for add entry but while reading 'entryId' can be optional. There is no real functional issues, only the concern is this will force us to create new protocol if any such requirement comes in future. again, as I said current bookie storage is per entry. if you want to support batch reads: 1) if you don't change bookie storage, you could build the batch read protocol on top of single read primitive. 2) if you changed bookie storage itself to support batch reads inside the storage, then it should be new request type to use new method in bookie storage, so the old read could still work with the storage that only support single read. this is for backward compatible. - Sijie --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17895/#review41281 --- On April 24, 2014, 7:43 a.m., Sijie Guo wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17895/ --- (Updated April 24, 2014, 7:43 a.m.) Review request for bookkeeper and Ivan Kelly. Bugs: BOOKKEEPER-582 https://issues.apache.org/jira/browse/BOOKKEEPER-582 Repository: bookkeeper-git Description --- - introducing protobuf support for bookkeeper - for server: introduce packet processor / EnDecoder for different protocol supports - for client: change PCBC to use protobuf to send requests - misc changes for protobuf support (bookie server
Re: Review Request 17895: BOOKKEEPER-582: protobuf support for bookkeeper
On April 23, 2014, 10:22 a.m., Ivan Kelly wrote: bookkeeper-server/src/main/proto/BookkeeperProtocol.proto, line 33 https://reviews.apache.org/r/17895/diff/2/?file=563033#file563033line33 This should certainly not be an enum. Otherwise we need to bump the protocol version each time we add an error code. Imagine the scenario where both server and client are running 4.3.0. Then the server is upgraded with 4.3.1 which a new error EPRINTERONFIRE. It sends this to the client who throws a decode error. Sijie Guo wrote: how could being not enum help this? if it is integer, client still has no idea how to interpret, so it is still invalid response from 4.3.0 client. I thought we reached an agreement on enum on the ticket, no? Ivan Kelly wrote: So for version and operationtype, enum is ok. These originate at the client, so if the servers are always upgraded at the client, there's no interoperability issues. Status codes originate at the server though, so it is possible for the server to send a statuscode that is unrecognised to a client. The normal way to handle this would be a else or default: to pass this up to the client as a BKException.UnexpectedConditionException. If it's an enum, this will throw a decode exception in the netty decoder, which is harder to handle. To resolve this on the server side, by checking the version and only sending errors valid for that version, implies two things. Firstly, every error code change will require the version to be bumped and secondly, that there will need to be a list maintained for which errors are valid for each version. This goes against the motivation for using protobuf in the first place. Sijie Guo wrote: this is the application level agreement, no? it doesn't matter that u are using a protobuf protocol or using current protocol, or it also doesn't matter that u are using an integer or an enum. in any case, the best way is as you described, you shouldn't send new status code back to an old client, as the new status code is meaningless to the old client. Ivan Kelly wrote: but how do you know its an old client? Only by bumping the version number each time you add an error code. In which case you end up with a whole lot of junk like if (client.version == X) { send A } else if (client.version == Y) { send B } else if (client.version ... which is exactly what protobuf was designed to avoid (see A bit of history on https://developers.google.com/protocol-buffers/docs/overview). Sijie Guo wrote: a else or default branch would make the behavior unpredictable as an old client is treating a new status code as some kind of unknown. as you said, you want to treat them as UnexpectedConditionException. But what does UnexpectedConditionException means? doesn't it mean the sever already breaks backward compatibility, since server couldn't satisfy the old client's request. so still, if server wants to be backward compatibility to clients, in any cases, it needs to know what version of protocol that the client is speaking and handle them accordingly, not just let client to do their job in an unexpected way. I don't see any elegant solutions without detecting protocol version. if you have, please describe how not being enum would avoid this. Ivan Kelly wrote: the default behaviour for an unknown error code is something we already use today. https://github.com/apache/bookkeeper/blob/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java#L714 The client only needs to know that the request failed. the point of the different error codes is so that the client could take specific recovery steps. the default behaviour is just to pass the error up. Sijie Guo wrote: the default behavior was there just for all already known status codes. but it doesn't mean it is correct for any unknown status codes. and when u are saying 'the client only needs to know that the request failed', you are making an assumption that there is only one status code indicating OK, other status code should be taken as failed. but it isn't true. say in an old protocol, we supported range reads, it responded with OK, list of entry response (0 = data, 1 = missing, 2 = missing, 3 = data). if we are going to improve our protocol to make communication more efficient, we are going to change the protocol to get rid of transferring missing entries: responding with PARTIAL_OK, list of existing entries (0 = data, 3 = data). in this case, if server doesn't distinguish the client's protocol, just respond every range reads with PARTIAL_OK, would did break the compatibility with old protocol, as old protocol treats it as failure by default behavior. in order to maintain backward compatibility, server needs to detect the client's protocol and responds
[jira] [Updated] (BOOKKEEPER-756) Use HashedwheelTimer for request timeouts for PCBC
[ https://issues.apache.org/jira/browse/BOOKKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-756: - Attachment: BOOKKEEPER-756.v2.diff addressed the comments Use HashedwheelTimer for request timeouts for PCBC -- Key: BOOKKEEPER-756 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-756 Project: Bookkeeper Issue Type: Improvement Components: bookkeeper-client Reporter: Sijie Guo Assignee: Sijie Guo Fix For: 4.3.0, 4.2.3 Attachments: BOOKKEEPER-756.diff, BOOKKEEPER-756.v2.diff Current scheduler based timeout mechanism is per batch, which isn't efficient. HashedWheelTimer is much better for timeouts. So change the PCBC to use HashedWheelTimer for timeouts. Besides that HashedWheelTimer change, it also provides multiple channel per bookie support for latency consideration. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 17895: BOOKKEEPER-582: protobuf support for bookkeeper
On April 24, 2014, 12:19 p.m., Ivan Kelly wrote: bookkeeper-server/src/main/proto/BookkeeperProtocol.proto, line 81 https://reviews.apache.org/r/17895/diff/2/?file=563033#file563033line81 ledgerId and entryId should be optional in all requests. It may be the case, that how we specify them changes in the future (like when we flatten the metadata), so it would be good to leave that possibility open. Sijie Guo wrote: for all existing reads/writes protocol, the ledgerId and entryId are required. I am not sure how u will change the protocol by flatten the metadata. but I guess that will be pretty new protocol. if so, it would be better to add new request type, so we will not break any existing systems or making it being complicated. Ivan Kelly wrote: Im not sure how it will change either. What I'm requesting is that the protobuf protocol doesnt lock us in to how we are doing it now forever. Ivan Kelly wrote: actually for a concrete example, lets say we want to do a read request for a whole ledger, from bookie A we request all entries, but it doesnt have every 3rd entry due to striping. In the case we can request all entrys from the ledger with entryid modulo 3 from bookie B. In this case, what would I put in the _required_ entry id firld for the read to bookie B? Sijie Guo wrote: as I said, doesn't it sound like a new read protocol? Ivan Kelly wrote: it will likely go through the same codepaths though, so we'll end up with a load of duplicate code. my concern with the requiredness of fields is that it's so rigid that in future we will have to add new messages to make any enhancements, causing the protocol to grow into something huge, with loads of redundancy, and not any better than we have now with the manual encoding. Sijie Guo wrote: single entry read/write are primitives of bookie, ledger id and entry id are required for them as they are the fundamental of bookkeeper. all other improvements like streaming or range read could be built on these primitives. then, if they are built on primitives, I don't see we will end up with a lot of duplicated codes. Rakesh R wrote: As far as compatibility issue is concerned, making optional is defensive approach and safe coding by avoiding parsing issues later. But it should be done very carefully at the code level because the requiredness is now handling at the code level. For example, at the server it should do validation to see a request has both ledgerId/entryId and which is mandatory or not etc. On the otherside for the required field, I could see the entity is not open for expansion by removing that field. But we have again options like like Sijie suggested, by defining new protocol and do the expansion. If we have a better way in hand to avoid code duplication, it is OK to go ahead with required. Sijie Guo wrote: as I said, currently bookie storage is built on single read/add entry primitives. there isn't any reason to make ledger id and entry id not to be required. if you are going to change the protocol to get rid of ledger id and entry id, you have to change the bookie storage. then I don't think there will be any code duplication. Rakesh R wrote: I agree this for add entry but while reading 'entryId' can be optional. There is no real functional issues, only the concern is this will force us to create new protocol if any such requirement comes in future. Sijie Guo wrote: again, as I said current bookie storage is per entry. if you want to support batch reads: 1) if you don't change bookie storage, you could build the batch read protocol on top of single read primitive. 2) if you changed bookie storage itself to support batch reads inside the storage, then it should be new request type to use new method in bookie storage, so the old read could still work with the storage that only support single read. this is for backward compatible. Sijie Guo wrote: one more comment, for any protocol requirements, please remember what kind of operations supported now in bookie storage, and what's the backward compatibility on bookie storage. OK. makes sense to me. - Rakesh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17895/#review41281 --- On April 24, 2014, 7:43 a.m., Sijie Guo wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17895/ --- (Updated April 24, 2014, 7:43 a.m.) Review request for bookkeeper and Ivan Kelly. Bugs: BOOKKEEPER-582 https://issues.apache.org/jira/browse/BOOKKEEPER-582 Repository: bookkeeper-git
Re: Review Request 17895: BOOKKEEPER-582: protobuf support for bookkeeper
On April 23, 2014, 10:22 a.m., Ivan Kelly wrote: bookkeeper-server/src/main/proto/BookkeeperProtocol.proto, line 33 https://reviews.apache.org/r/17895/diff/2/?file=563033#file563033line33 This should certainly not be an enum. Otherwise we need to bump the protocol version each time we add an error code. Imagine the scenario where both server and client are running 4.3.0. Then the server is upgraded with 4.3.1 which a new error EPRINTERONFIRE. It sends this to the client who throws a decode error. Sijie Guo wrote: how could being not enum help this? if it is integer, client still has no idea how to interpret, so it is still invalid response from 4.3.0 client. I thought we reached an agreement on enum on the ticket, no? Ivan Kelly wrote: So for version and operationtype, enum is ok. These originate at the client, so if the servers are always upgraded at the client, there's no interoperability issues. Status codes originate at the server though, so it is possible for the server to send a statuscode that is unrecognised to a client. The normal way to handle this would be a else or default: to pass this up to the client as a BKException.UnexpectedConditionException. If it's an enum, this will throw a decode exception in the netty decoder, which is harder to handle. To resolve this on the server side, by checking the version and only sending errors valid for that version, implies two things. Firstly, every error code change will require the version to be bumped and secondly, that there will need to be a list maintained for which errors are valid for each version. This goes against the motivation for using protobuf in the first place. Sijie Guo wrote: this is the application level agreement, no? it doesn't matter that u are using a protobuf protocol or using current protocol, or it also doesn't matter that u are using an integer or an enum. in any case, the best way is as you described, you shouldn't send new status code back to an old client, as the new status code is meaningless to the old client. Ivan Kelly wrote: but how do you know its an old client? Only by bumping the version number each time you add an error code. In which case you end up with a whole lot of junk like if (client.version == X) { send A } else if (client.version == Y) { send B } else if (client.version ... which is exactly what protobuf was designed to avoid (see A bit of history on https://developers.google.com/protocol-buffers/docs/overview). Sijie Guo wrote: a else or default branch would make the behavior unpredictable as an old client is treating a new status code as some kind of unknown. as you said, you want to treat them as UnexpectedConditionException. But what does UnexpectedConditionException means? doesn't it mean the sever already breaks backward compatibility, since server couldn't satisfy the old client's request. so still, if server wants to be backward compatibility to clients, in any cases, it needs to know what version of protocol that the client is speaking and handle them accordingly, not just let client to do their job in an unexpected way. I don't see any elegant solutions without detecting protocol version. if you have, please describe how not being enum would avoid this. Ivan Kelly wrote: the default behaviour for an unknown error code is something we already use today. https://github.com/apache/bookkeeper/blob/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java#L714 The client only needs to know that the request failed. the point of the different error codes is so that the client could take specific recovery steps. the default behaviour is just to pass the error up. Sijie Guo wrote: the default behavior was there just for all already known status codes. but it doesn't mean it is correct for any unknown status codes. and when u are saying 'the client only needs to know that the request failed', you are making an assumption that there is only one status code indicating OK, other status code should be taken as failed. but it isn't true. say in an old protocol, we supported range reads, it responded with OK, list of entry response (0 = data, 1 = missing, 2 = missing, 3 = data). if we are going to improve our protocol to make communication more efficient, we are going to change the protocol to get rid of transferring missing entries: responding with PARTIAL_OK, list of existing entries (0 = data, 3 = data). in this case, if server doesn't distinguish the client's protocol, just respond every range reads with PARTIAL_OK, would did break the compatibility with old protocol, as old protocol treats it as failure by default behavior. in order to maintain backward compatibility, server needs to detect the client's protocol and responds