[jira] [Commented] (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103677#comment-13103677 ] Camille Fournier commented on ZOOKEEPER-737: Netty is apparently even worse, as it hangs the connection and I can't even ctrl-c out of it in telnet. some 4 letter words may fail with netcat (nc) - Key: ZOOKEEPER-737 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch nc closes the write channel as soon as it's sent it's information, for example echo stat|nc localhost 2181 in general this is fine, however the server code will close the socket as soon as it receives notice that nc has closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part of a change to reduce blocking of the selector by long running 4letter words. here's an example of the logs from the server during this echo -n stat | nc localhost 2181 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179 2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client) [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1179) NettyServerCnxn does not properly close socket on 4 letter word requests
NettyServerCnxn does not properly close socket on 4 letter word requests Key: ZOOKEEPER-1179 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1179 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.0 Reporter: Camille Fournier When calling a 4-letter-word to a server configured to use NettyServerCnxnFactory, the factory will not properly cancel all the keys and close the socket after sending the response for the 4lw. The close request will throw this exception, and the thread will not shut down: 2011-09-13 12:14:17,546 - WARN [New I/O server worker #1-1:NettyServerCnxnFactory$CnxnChannelHandler@117] - Exception caught [id: 0x009300cc, /1.1.1.1:38542 = /139.172.114.138:2181] EXCEPTION: java.io.IOException: A non-blocking socket operation could not be completed immediately java.io.IOException: A non-blocking socket operation could not be completed immediately at sun.nio.ch.SocketDispatcher.close0(Native Method) at sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:44) at sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:684) at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:201) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97) at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:593) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:119) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76) at org.jboss.netty.channel.Channels.close(Channels.java:720) at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:208) at org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:116) at org.apache.zookeeper.server.NettyServerCnxn.cleanupWriterSocket(NettyServerCnxn.java:241) at org.apache.zookeeper.server.NettyServerCnxn.access$0(NettyServerCnxn.java:231) at org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:314) at org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:305) at org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:674) at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:791) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:217) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:141) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1178) Add eclipse target for supporting Apache IvyDE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103758#comment-13103758 ] Warren Turkal commented on ZOOKEEPER-1178: -- Looks like this issue is still unassigned. Would anyone be willing to take it on. It's a really small patch. Add eclipse target for supporting Apache IvyDE -- Key: ZOOKEEPER-1178 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1178 Project: ZooKeeper Issue Type: Improvement Components: build Environment: Mac OS X w/ Eclipse 3.7. However, I believe this will work in any Eclipse environment. Reporter: Warren Turkal Priority: Minor Attachments: eclipse-apache-ivyde-support.patch Original Estimate: 1h Remaining Estimate: 1h This patch adds support for Eclipse with Apache IvyDE, which is the extension that integrates Ivy support into Eclipse. This allows the creation of what appear to be fully portable .eclipse and .classpath files. I will be posting a patch shortly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103862#comment-13103862 ] Patrick Hunt commented on ZOOKEEPER-737: which version of nc are you using? The newer (ubuntu pkging) openbsd or traditional? You might try both. We got it to work well with traditional, then ubuntu went and made openbsd the default: {noformat} phunt@ubuntu:~$ nc This is nc from the netcat-openbsd package. An alternative nc is available in the netcat-traditional package. usage: nc [-46DdhklnrStUuvzC] [-i interval] [-P proxy_username] [-p source_port] [-s source_ip_address] [-T ToS] [-w timeout] [-X proxy_protocol] [-x proxy_address[:port]] [hostname] [port[s]] phunt@ubuntu:~$ ls /bin/nc.* /bin/nc.openbsd* /bin/nc.traditional* {noformat} some 4 letter words may fail with netcat (nc) - Key: ZOOKEEPER-737 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch nc closes the write channel as soon as it's sent it's information, for example echo stat|nc localhost 2181 in general this is fine, however the server code will close the socket as soon as it receives notice that nc has closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part of a change to reduce blocking of the selector by long running 4letter words. here's an example of the logs from the server during this echo -n stat | nc localhost 2181 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179 2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client) [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1174) FD leak when network unreachable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103890#comment-13103890 ] Patrick Hunt commented on ZOOKEEPER-1174: - If we go wth powermock let's use the mockito variety. FD leak when network unreachable Key: ZOOKEEPER-1174 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1174 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.3 Reporter: Ted Dunning Assignee: Ted Dunning Priority: Critical Fix For: 3.3.4 Attachments: ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, ZOOKEEPER-1174.patch, zk-fd-leak.tgz In the socket connection logic there are several errors that result in bad behavior. The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with. First, the socket may connect immediately. Secondly, the connect may throw an exception. In either of these two cases, I don't think that the socket should be registered. I will attach a test case that demonstrates the problem. I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so. It would still be good to do so if somebody can figure out a good way. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1166) Please add a few svn:ignore properties
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103895#comment-13103895 ] Patrick Hunt commented on ZOOKEEPER-1166: - I didn't generate eclipse in my testing. Yes, please enter a new jira for this. Thx. Please add a few svn:ignore properties -- Key: ZOOKEEPER-1166 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1166 Project: ZooKeeper Issue Type: Improvement Components: build Affects Versions: 3.4.0 Reporter: Warren Turkal Assignee: Patrick Hunt Priority: Minor Fix For: 3.4.0 Original Estimate: 1h Remaining Estimate: 1h Please add a couple svn:ignore properties to make dealing with the code slightly easier. At the root, please add an svn:ignore property for build so that the default build directory for eclipse is excluded. At src/java/lib, please add an svn:ignore property for *.jar so that jars acquired by ivy are ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103912#comment-13103912 ] Camille Fournier commented on ZOOKEEPER-737: I have no idea, this isn't on ubuntu though, I suspect it's traditional. Should I make a ticket for this fix? I don't feel like I fully understand the problem myself but if you guys think that just extending the SO_LINGER for the 4lws is the right way to go for the NIO version, I'm happy to make a patch and test it out. some 4 letter words may fail with netcat (nc) - Key: ZOOKEEPER-737 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch nc closes the write channel as soon as it's sent it's information, for example echo stat|nc localhost 2181 in general this is fine, however the server code will close the socket as soon as it receives notice that nc has closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part of a change to reduce blocking of the selector by long running 4letter words. here's an example of the logs from the server during this echo -n stat | nc localhost 2181 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179 2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client) [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-961: --- Assignee: Matthias Spycher Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104041#comment-13104041 ] Camille Fournier commented on ZOOKEEPER-737: So here's an interesting observation: If I do nothing but set the socket SoLinger to sock.socket().setSoLinger(false, -1); everything actually seems to work. Except nc interactive, which still fails, but I don't care about that. I'm going to suggest we add the fix for ZOOKEEPER-1049 into 3.3.4. We still need to fix netty. some 4 letter words may fail with netcat (nc) - Key: ZOOKEEPER-737 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch nc closes the write channel as soon as it's sent it's information, for example echo stat|nc localhost 2181 in general this is fine, however the server code will close the socket as soon as it receives notice that nc has closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part of a change to reduce blocking of the selector by long running 4letter words. here's an example of the logs from the server during this echo -n stat | nc localhost 2181 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179 2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client) [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-961: Attachment: ZOOKEEPER-961b.patch Reuploading Matt's patch for hudson CI. Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104058#comment-13104058 ] Mahadev konar commented on ZOOKEEPER-961: - Matthias, Looks like the patch doesnt apply to 3.3 branch. Can you please create a patch for that? thanks Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104066#comment-13104066 ] Hadoop QA commented on ZOOKEEPER-961: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494339/ZOOKEEPER-961b.patch against trunk revision 1170365. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//console This message is automatically generated. Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Success: ZOOKEEPER-961 PreCommit Build #530
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 138396 lines...] [exec] BUILD SUCCESSFUL [exec] Total time: 0 seconds [exec] [exec] [exec] [exec] [exec] +1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12494339/ZOOKEEPER-961b.patch [exec] against trunk revision 1170365. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 3L44w1Dw1z logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD SUCCESSFUL Total time: 24 minutes 24 seconds Archiving artifacts Recording test results Description set: ZOOKEEPER-961 Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Created] (ZOOKEEPER-1180) New entry for files ignored by svn.
New entry for files ignored by svn. --- Key: ZOOKEEPER-1180 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1180 Project: ZooKeeper Issue Type: Improvement Reporter: Warren Turkal Assignee: Patrick Hunt Priority: Minor The following entry needs to be added to the svn:ignore property for src/java/lib: ant-eclipse-*.jar This will ignore the ant-eclipse-*.jar file which is downloaded when running the ant eclipse target. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (ZOOKEEPER-1181) Fix problems with Kerberos TGT renewal
Fix problems with Kerberos TGT renewal -- Key: ZOOKEEPER-1181 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1181 Project: ZooKeeper Issue Type: Bug Components: java client, server Affects Versions: 3.4.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Fix For: 3.4.0 Currently, in Zookeeper trunk, there are two problems with Kerberos TGT renewal: 1. TGTs obtained from a keytab are not refreshed periodically. They should be, just as those from ticket cache are refreshed. 2. Ticket renewal should be retried if it fails. Ticket renewal might fail if two or more separate processes (different JVMs) running as the same user try to renew Kerberos credentials at the same time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1125) Intermittent java core test failures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated ZOOKEEPER-1125: - Attachment: fail_on_27th_iteration.log.gz Unfortunately CnxManagerTest failed on the 27th iteration (please see attached log). Intermittent java core test failures Key: ZOOKEEPER-1125 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1125 Project: ZooKeeper Issue Type: Bug Reporter: Vishal Kher Assignee: Vishal Kher Priority: Blocker Fix For: 3.4.0 Attachments: ZOOKEEPER-1125.patch, fail_on_27th_iteration.log.gz, repeat-script.patch, zk1125.log.gz Some of the tests are consistently failing for me and intermittently on hudson. Posting discussion from mailing list below. Vishal, Can you please open a jira for this and mark it as a blocker for 3.4 release? Looks like its transient: https://builds.apache.org/job/ZooKeeper-trunk/ The latest build is passing. thanks mahadev - Hide quoted text - On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher vishalm...@gmail.com wrote: Hi, ant test-core-java is consistently failing for me. The error seems to be either: Testcase: testFollowersStartAfterLeader took 35.577 sec Caused an ERROR Did not connect java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124) at org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) or Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /blah org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /blah at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) at org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Looks like the reason why the tests are failing for me is similar to why the tests failed on hudson: 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :Leader@425] - Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729) 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :ZooKeeperServer@416] - shutting down The leader is not able to ping the followers. Has anyone seen this before? Thanks. -Vishal On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/ZooKeeper-trunk/1239/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 242795 lines...] [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:SessionTrackerImpl@206] - Shutting down [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:PrepRequestProcessor@702] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [main:SyncRequestProcessor@170] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! [junit] 2011-07-10 10:57:16,675 [myid:] - INFO [main:FinalRequestProcessor@423] - shutdown of request processor complete [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] - STARTING server [junit] 2011-07-10 10:57:16,678 [myid:] - INFO [main:ZooKeeperServer@164] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 snapdir /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 [junit] 2011-07-10
backwards compatibility tests
i'm just putting this out there: we have a backwards compatibility requirement, and we try to make sure that we achieve it, but we don't have any tests for it. does anyone have any great ideas (and perhaps energy to implement) about how to do some nice tests? it would be nice to do it is a very low maintenance/automatic way. ben
Failed: ZOOKEEPER-1181 PreCommit Build #531
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1181 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 138431 lines...] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12494346/ZOOKEEPER-1181.patch [exec] against trunk revision 1170365. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] Dj2U430XER logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1452: exec returned: 2 Total time: 25 minutes 18 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-1181 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (ZOOKEEPER-1181) Fix problems with Kerberos TGT renewal
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104092#comment-13104092 ] Hadoop QA commented on ZOOKEEPER-1181: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494346/ZOOKEEPER-1181.patch against trunk revision 1170365. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/531//console This message is automatically generated. Fix problems with Kerberos TGT renewal -- Key: ZOOKEEPER-1181 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1181 Project: ZooKeeper Issue Type: Bug Components: java client, server Affects Versions: 3.4.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Labels: kerberos, security Fix For: 3.4.0 Attachments: ZOOKEEPER-1181.patch Currently, in Zookeeper trunk, there are two problems with Kerberos TGT renewal: 1. TGTs obtained from a keytab are not refreshed periodically. They should be, just as those from ticket cache are refreshed. 2. Ticket renewal should be retried if it fails. Ticket renewal might fail if two or more separate processes (different JVMs) running as the same user try to renew Kerberos credentials at the same time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1125) Intermittent java core test failures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104111#comment-13104111 ] Eugene Koontz commented on ZOOKEEPER-1125: -- Just for some additional detail on my own testing. CnxManagerTest is failing in testWorkerThreads(). This test starts , when the test shuts down members of its set of Quorum Peers, one at a time, and restarts replacements for them. Apparently sometimes, these replacements apparently are not coming up in a timely fashion. Intermittent java core test failures Key: ZOOKEEPER-1125 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1125 Project: ZooKeeper Issue Type: Bug Reporter: Vishal Kher Assignee: Vishal Kher Priority: Blocker Fix For: 3.4.0 Attachments: ZOOKEEPER-1125.patch, fail_on_27th_iteration.log.gz, repeat-script.patch, zk1125.log.gz Some of the tests are consistently failing for me and intermittently on hudson. Posting discussion from mailing list below. Vishal, Can you please open a jira for this and mark it as a blocker for 3.4 release? Looks like its transient: https://builds.apache.org/job/ZooKeeper-trunk/ The latest build is passing. thanks mahadev - Hide quoted text - On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher vishalm...@gmail.com wrote: Hi, ant test-core-java is consistently failing for me. The error seems to be either: Testcase: testFollowersStartAfterLeader took 35.577 sec Caused an ERROR Did not connect java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124) at org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) or Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /blah org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /blah at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) at org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Looks like the reason why the tests are failing for me is similar to why the tests failed on hudson: 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :Leader@425] - Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729) 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :ZooKeeperServer@416] - shutting down The leader is not able to ping the followers. Has anyone seen this before? Thanks. -Vishal On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/ZooKeeper-trunk/1239/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 242795 lines...] [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:SessionTrackerImpl@206] - Shutting down [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:PrepRequestProcessor@702] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [main:SyncRequestProcessor@170] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! [junit] 2011-07-10 10:57:16,675 [myid:] - INFO [main:FinalRequestProcessor@423] - shutdown of request processor complete [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] - STARTING server [junit] 2011-07-10 10:57:16,678 [myid:] - INFO [main:ZooKeeperServer@164] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir
[jira] [Commented] (ZOOKEEPER-1125) Intermittent java core test failures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104114#comment-13104114 ] Eugene Koontz commented on ZOOKEEPER-1125: -- Sorry, the second sentence in the above should read: This test starts a set of Quorum Peers and then shuts them down, one at a time, and starts replacements for the ones that were shut down. Intermittent java core test failures Key: ZOOKEEPER-1125 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1125 Project: ZooKeeper Issue Type: Bug Reporter: Vishal Kher Assignee: Vishal Kher Priority: Blocker Fix For: 3.4.0 Attachments: ZOOKEEPER-1125.patch, fail_on_27th_iteration.log.gz, repeat-script.patch, zk1125.log.gz Some of the tests are consistently failing for me and intermittently on hudson. Posting discussion from mailing list below. Vishal, Can you please open a jira for this and mark it as a blocker for 3.4 release? Looks like its transient: https://builds.apache.org/job/ZooKeeper-trunk/ The latest build is passing. thanks mahadev - Hide quoted text - On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher vishalm...@gmail.com wrote: Hi, ant test-core-java is consistently failing for me. The error seems to be either: Testcase: testFollowersStartAfterLeader took 35.577 sec Caused an ERROR Did not connect java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124) at org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) or Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /blah org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /blah at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) at org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Looks like the reason why the tests are failing for me is similar to why the tests failed on hudson: 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :Leader@425] - Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729) 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :ZooKeeperServer@416] - shutting down The leader is not able to ping the followers. Has anyone seen this before? Thanks. -Vishal On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/ZooKeeper-trunk/1239/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 242795 lines...] [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:SessionTrackerImpl@206] - Shutting down [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:PrepRequestProcessor@702] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [main:SyncRequestProcessor@170] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! [junit] 2011-07-10 10:57:16,675 [myid:] - INFO [main:FinalRequestProcessor@423] - shutdown of request processor complete [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] - STARTING server [junit] 2011-07-10 10:57:16,678 [myid:] - INFO [main:ZooKeeperServer@164] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 snapdir
[jira] [Updated] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Spycher updated ZOOKEEPER-961: --- Attachment: ZOOKEEPER-961.patch This attachment is intended for branch 3.3. It prepends the chroot (if any) for all watches in SendThread.primeConnection(...). Also note that the check for closing is done in SendThread.startConnect() after the sleep. We still have a potential race between close/connecting, but it's an improvement. Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Failed: ZOOKEEPER-961 PreCommit Build #533
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/533/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 74 lines...] [exec] 1 out of 1 hunk ignored -- saving rejects to file src/java/main/org/apache/zookeeper/ZooKeeper.java.rej [exec] patching file src/java/main/org/apache/zookeeper/ClientCnxn.java [exec] Hunk #1 succeeded at 25 with fuzz 2 (offset -6 lines). [exec] Hunk #2 FAILED at 965. [exec] Hunk #3 succeeded at 874 (offset -126 lines). [exec] Hunk #4 FAILED at 1029. [exec] Hunk #5 FAILED at 1049. [exec] Hunk #6 FAILED at 1088. [exec] Hunk #7 succeeded at 1021 with fuzz 2 (offset -138 lines). [exec] 4 out of 7 hunks FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/ClientCnxn.java.rej [exec] PATCH APPLICATION FAILED [exec] [exec] [exec] [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12494355/ZOOKEEPER-961.patch [exec] against trunk revision 1170365. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 patch. The patch command could not apply the patch. [exec] [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/533//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] Pgkzu9Jxn9 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1452: exec returned: 1 Total time: 36 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Description set: ZOOKEEPER-961 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104121#comment-13104121 ] Hadoop QA commented on ZOOKEEPER-961: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494355/ZOOKEEPER-961.patch against trunk revision 1170365. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/533//console This message is automatically generated. Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-992) MT Native Version of Windows C Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104137#comment-13104137 ] ZhangQi commented on ZOOKEEPER-992: --- i found a strange phenomenon when i test your zk trunk with a c program(build by visual studio 2003). I run the zk service in a 3 machine cluster.First i build a node /zookeeper/zk_test01. Then i insert sequence node /zookeeper/zk_test01/node. zk service will fail when the sequence node number exceed 20, at the same time zkcli lost connection with zk service.How does it happen, does zk have node number limit? MT Native Version of Windows C Client -- Key: ZOOKEEPER-992 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-992 Project: ZooKeeper Issue Type: New Feature Components: c client Environment: Windows 32 Reporter: Camille Fournier Assignee: Dheeraj Agrawal Fix For: 3.4.0 Attachments: ZOOKEEPER-992-final.patch, ZOOKEEPER-992-final.patch, ZOOKEEPER-992-round8.patch, ZOOKEEPER-992.patch, ZOOKEEPER-992.patch, ZOOKEEPER_992_FINAL.patch, ZOOKEEPER_992_final.patch, errors.txt, win32-odysseus-vc2k5.patch, win32_patch.txt, win32_patch_notabs.txt This is an extention of the work in https://issues.apache.org/jira/browse/ZOOKEEPER-859 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-992) MT Native Version of Windows C Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104147#comment-13104147 ] Mahadev konar commented on ZOOKEEPER-992: - Zhang, ZK has a limit on the amount of data that be transferred across the server and client (i.e 1MB). If you are creating direct children of single node and the data is more than 1MB, getChildren might fail. Also, please open a seperate jira to track this. MT Native Version of Windows C Client -- Key: ZOOKEEPER-992 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-992 Project: ZooKeeper Issue Type: New Feature Components: c client Environment: Windows 32 Reporter: Camille Fournier Assignee: Dheeraj Agrawal Fix For: 3.4.0 Attachments: ZOOKEEPER-992-final.patch, ZOOKEEPER-992-final.patch, ZOOKEEPER-992-round8.patch, ZOOKEEPER-992.patch, ZOOKEEPER-992.patch, ZOOKEEPER_992_FINAL.patch, ZOOKEEPER_992_final.patch, errors.txt, win32-odysseus-vc2k5.patch, win32_patch.txt, win32_patch_notabs.txt This is an extention of the work in https://issues.apache.org/jira/browse/ZOOKEEPER-859 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (BOOKKEEPER-39) Bookie server failed to restart because of too many ledgers (more than ~50,000 ledgers)
[ https://issues.apache.org/jira/browse/BOOKKEEPER-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sijie Guo updated BOOKKEEPER-39: Attachment: bookkeeper-39.patch Create a patch to partition the ledgers into 2-level hashed zk nodes, which avoid packetLen exception during garbage collection. Bookie server failed to restart because of too many ledgers (more than ~50,000 ledgers) --- Key: BOOKKEEPER-39 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-39 Project: Bookkeeper Issue Type: Bug Components: bookkeeper-server Affects Versions: 3.4.0 Reporter: Sijie Guo Attachments: bookkeeper-39.patch If we have ~500,000 topics in hedwig, we might have more than ~500,000 ledgers in bookkeeper (a topic has more than 1 ledger). So when the bookie server restarted, a logfile GC thread is started, which will call zk.getChildren to fetch all ledgers, and it failed because of package length limitation. 2011-08-01 01:18:46,373 - ERROR [main-EventThread:EntryLogger$GarbageCollectorThread$1@164] - Error polling ZK for the available ledger nodes: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ledgers at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1519) at org.apache.bookkeeper.bookie.EntryLogger$GarbageCollectorThread$1.processResult(EntryLogger.java:162) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:481) 2011-08-01 01:18:46,373 - WARN [main-EventThread:Bookie$1@242] - ZK client has been disconnected to the ZK server! 2011-08-01 01:18:47,278 - WARN [main-SendThread(perf13.platform.mobile.sp2.yahoo.com:2181):ClientCnxn$SendThread@980] - Session 0x131833dec850034 for server perf13.platform.mobile.sp2.yahoo.com/98.139.43.86:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len9976413 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:78) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:264) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:958) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-737) some 4 letter words may fail with netcat (nc)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104214#comment-13104214 ] Benjamin Reed commented on ZOOKEEPER-737: - that is weird, since we should have already done setSoLinger(false, -1) when the socket was accepted. (see the constructor for NIOServerCnxn.) perhaps the slight delay of invoking the setsolinger gives the packets enough time to get out... some 4 letter words may fail with netcat (nc) - Key: ZOOKEEPER-737 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-737 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.3.0 Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Blocker Fix For: 3.3.1, 3.4.0 Attachments: ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch, ZOOKEEPER-737.patch nc closes the write channel as soon as it's sent it's information, for example echo stat|nc localhost 2181 in general this is fine, however the server code will close the socket as soon as it receives notice that nc has closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part of a change to reduce blocking of the selector by long running 4letter words. here's an example of the logs from the server during this echo -n stat | nc localhost 2181 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179 2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client) [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1125) Intermittent java core test failures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-1125: - Priority: Major (was: Blocker) Just committed the patch. Thanks Vishal. I am downgrading the jira to a Major one. I dont think we should block the release with a test case failure that happens rarely. Intermittent java core test failures Key: ZOOKEEPER-1125 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1125 Project: ZooKeeper Issue Type: Bug Reporter: Vishal Kher Assignee: Vishal Kher Fix For: 3.4.0 Attachments: ZOOKEEPER-1125.patch, fail_on_27th_iteration.log.gz, repeat-script.patch, zk1125.log.gz Some of the tests are consistently failing for me and intermittently on hudson. Posting discussion from mailing list below. Vishal, Can you please open a jira for this and mark it as a blocker for 3.4 release? Looks like its transient: https://builds.apache.org/job/ZooKeeper-trunk/ The latest build is passing. thanks mahadev - Hide quoted text - On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher vishalm...@gmail.com wrote: Hi, ant test-core-java is consistently failing for me. The error seems to be either: Testcase: testFollowersStartAfterLeader took 35.577 sec Caused an ERROR Did not connect java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124) at org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) or Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /blah org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /blah at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) at org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Looks like the reason why the tests are failing for me is similar to why the tests failed on hudson: 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :Leader@425] - Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729) 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :ZooKeeperServer@416] - shutting down The leader is not able to ping the followers. Has anyone seen this before? Thanks. -Vishal On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/ZooKeeper-trunk/1239/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 242795 lines...] [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:SessionTrackerImpl@206] - Shutting down [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:PrepRequestProcessor@702] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [main:SyncRequestProcessor@170] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! [junit] 2011-07-10 10:57:16,675 [myid:] - INFO [main:FinalRequestProcessor@423] - shutdown of request processor complete [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] - STARTING server [junit] 2011-07-10 10:57:16,678 [myid:] - INFO [main:ZooKeeperServer@164] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 snapdir
[jira] [Commented] (ZOOKEEPER-1125) Intermittent java core test failures
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104246#comment-13104246 ] Mahadev konar commented on ZOOKEEPER-1125: -- Ill leave this open since the issue isnt entirely fixed. Intermittent java core test failures Key: ZOOKEEPER-1125 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1125 Project: ZooKeeper Issue Type: Bug Reporter: Vishal Kher Assignee: Vishal Kher Fix For: 3.4.0 Attachments: ZOOKEEPER-1125.patch, fail_on_27th_iteration.log.gz, repeat-script.patch, zk1125.log.gz Some of the tests are consistently failing for me and intermittently on hudson. Posting discussion from mailing list below. Vishal, Can you please open a jira for this and mark it as a blocker for 3.4 release? Looks like its transient: https://builds.apache.org/job/ZooKeeper-trunk/ The latest build is passing. thanks mahadev - Hide quoted text - On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher vishalm...@gmail.com wrote: Hi, ant test-core-java is consistently failing for me. The error seems to be either: Testcase: testFollowersStartAfterLeader took 35.577 sec Caused an ERROR Did not connect java.util.concurrent.TimeoutException: Did not connect at org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124) at org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) or Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /blah org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /blah at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) at org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Looks like the reason why the tests are failing for me is similar to why the tests failed on hudson: 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :Leader@425] - Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729) 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 :ZooKeeperServer@416] - shutting down The leader is not able to ping the followers. Has anyone seen this before? Thanks. -Vishal On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/ZooKeeper-trunk/1239/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 242795 lines...] [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:SessionTrackerImpl@206] - Shutting down [junit] 2011-07-10 10:57:16,673 [myid:] - INFO [main:PrepRequestProcessor@702] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [main:SyncRequestProcessor@170] - Shutting down [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! [junit] 2011-07-10 10:57:16,675 [myid:] - INFO [main:FinalRequestProcessor@423] - shutdown of request processor complete [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] - connecting to 127.0.0.1 11221 [junit] ensureOnly:[] [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] - STARTING server [junit] 2011-07-10 10:57:16,678 [myid:] - INFO [main:ZooKeeperServer@164] - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 6 datadir /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 snapdir /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 [junit] 2011-07-10 10:57:16,679 [myid:] - INFO
[jira] [Commented] (ZOOKEEPER-961) Watch recovery after disconnection when connection string contains a prefix
[ https://issues.apache.org/jira/browse/ZOOKEEPER-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104268#comment-13104268 ] Mahadev konar commented on ZOOKEEPER-961: - +1 looks good to me. Watch recovery after disconnection when connection string contains a prefix --- Key: ZOOKEEPER-961 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-961 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Windows 32 bits Reporter: pmpm47 Assignee: Matthias Spycher Priority: Critical Fix For: 3.3.4, 3.4.0 Attachments: ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961.patch, ZOOKEEPER-961b.patch, ZOOKEEPER-961b.patch Let's say you're using connection string 127.0.0.1:2182/foo. 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (BOOKKEEPER-65) fix dependencies on incompatible versions of netty
[ https://issues.apache.org/jira/browse/BOOKKEEPER-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103476#comment-13103476 ] Ivan Kelly commented on BOOKKEEPER-65: -- running mvn clean test with this patch applied causes failures in tests for me. Failures attached. fix dependencies on incompatible versions of netty -- Key: BOOKKEEPER-65 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-65 Project: Bookkeeper Issue Type: Bug Reporter: Matthieu Morel Attachments: BOOKKEEPER-65.patch, org.apache.hedwig.server.integration.TestHedwigHub.txt bookkeeper-benchmark and hedwig-client depend on netty 3.1.2.GA bookkeeper-server depends on netty 3.2.4.Final These versions are actually incompatible, due to a change to ProtobufDecoder constructor's signature -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira