[jira] [Commented] (ZOOKEEPER-1381) Add a method to get the zookeeper server version from the client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400750#comment-13400750 ] Zhihong Ted Yu commented on ZOOKEEPER-1381: --- Looks like UnimplementedRequestProcessor().processRequest() can be served by a singleton UnimplementedRequestProcessor. Add a method to get the zookeeper server version from the client Key: ZOOKEEPER-1381 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1381 Project: ZooKeeper Issue Type: Improvement Components: c client, documentation, java client, server Affects Versions: 3.4.2 Environment: all Reporter: nkeywal Priority: Minor Labels: newbie Attachments: 1381.br33.v3.patch Zookeeper client API is designed to be server version agnostic as much as possible, so we can have new clients with old servers (or the opposite). But there is today no simple way for a client to know what's the server version. This would be very useful in order to; - check the compatibility (ex: 'multi' implementation available since 3.4 while 3.4 clients API supports 3.3 servers as well) - have different implementation depending on the server functionalities A workaround (proposed by Mahadev Konar) is do echo stat | nc hostname clientport and parse the output to get the version. The output is, for example: --- Zookeeper version: 3.4.2--1, built on 01/30/2012 17:43 GMT Clients: /127.0.0.1:54951[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 1 Sent: 0 Outstanding: 0 Zxid: 0x50001 Mode: follower Node count: 7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1560: -- Attachment: zookeeper-1560-v5.txt From https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//testReport/org.apache.zookeeper.test/ClientTest/testLargeNodeData/ : {code} 2012-10-12 14:10:50,042 [myid:] - WARN [main-SendThread(localhost:11221):ClientCnxn$SendThread@1089] - Session 0x13a555031cf for server localhost/127.0.0.1:11221, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Couldn't write 2000 bytes, 1152 bytes written at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:142) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 2012-10-12 14:10:50,044 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x13a555031cf, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662) {code} Patch v5 adds more information to exception message. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1560: -- Attachment: zookeeper-1560-v6.txt Patch v6 changes the condition for raising IOE: if there is no progress between successive sock.write() calls. I guess socket's output buffer might be a limiting factor as to the number of bytes written in a particular sock.write() call. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1560: -- Attachment: zookeeper-1560-v7.txt Patch v7 changes the IOE to a warning. Let's see if the test is able to make further progress. I wonder whether 77152 bytes would be big enough for most use cases. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475146#comment-13475146 ] Ted Yu commented on ZOOKEEPER-1560: --- Good news was that patch v7 passed. Not so good news was that I didn't find any occurrence of the warning message I added in v7. Essentially patch v7 is the same as patch v2 - we shouldn't bail if a single sock.write() call didn't make progress. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476591#comment-13476591 ] Ted Yu commented on ZOOKEEPER-107: -- I wonder if patch v7 from ZOOKEEPER-1560 would help prevent such test failure. Will combine the two patches and run these two tests locally. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, ZOOKEEPER-107-14-Oct.patch, ZOOKEEPER-107-15-Oct.patch, ZOOKEEPER-107-15-Oct-ver1.patch, ZOOKEEPER-107-15-Oct-ver2.patch, ZOOKEEPER-107-15-Oct-ver3.patch, ZOOKEEPER-107-1-Mar.patch, ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-3-Oct.patch, ZOOKEEPER-107-Aug-20.patch, ZOOKEEPER-107-Aug-20-ver1.patch, ZOOKEEPER-107-Aug-25.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, zoo_replicated1.cfg, zoo_replicated1.members Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476702#comment-13476702 ] Ted Yu commented on ZOOKEEPER-107: -- ReconfigTest hung with combined patch. Test output is quite long (25MB). I am not familiar with ReconfigTest so am not sure what to look for in test output. {code} LOG.warn(Couldn't write + expectedSize + bytes, {code} I verified that the above log which I added in patch v7 for ZOOKEEPER-1560 didn't appear in test output. Allow dynamic changes to server cluster membership -- Key: ZOOKEEPER-107 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Assignee: Alexander Shraer Fix For: 3.5.0 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, ZOOKEEPER-107-14-Oct.patch, ZOOKEEPER-107-15-Oct.patch, ZOOKEEPER-107-15-Oct-ver1.patch, ZOOKEEPER-107-15-Oct-ver2.patch, ZOOKEEPER-107-15-Oct-ver3.patch, ZOOKEEPER-107-1-Mar.patch, ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-3-Oct.patch, ZOOKEEPER-107-Aug-20.patch, ZOOKEEPER-107-Aug-20-ver1.patch, ZOOKEEPER-107-Aug-25.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, zoo_replicated1.cfg, zoo_replicated1.members Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483597#comment-13483597 ] Ted Yu commented on ZOOKEEPER-1560: --- Looking at createBB(), upon exit the field bb wouldn't be null. I wonder why p.createBB() is enclosed in the if (p.bb != null) block above ? Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483601#comment-13483601 ] Ted Yu commented on ZOOKEEPER-1560: --- bq. similar to what it was before - write as much as possible and then use the selector to wait for the socket to become writeable again I looked at svn log for ClientCnxnSocketNIO.java back to 2011-04-12 and didn't seem to find the above change. FYI Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483618#comment-13483618 ] Ted Yu commented on ZOOKEEPER-1568: --- bq. A multi will case one new snapshot/log to be generated I guess you meant 'cause' above. bq. but there was no guarantee they'd all succeed/fail. I think we need to formalize how success / failure status for individual operations in this new multi API should be delivered back to client. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483635#comment-13483635 ] Ted Yu commented on ZOOKEEPER-1568: --- bq. it aborts on the first op that fails and rolls back Should we allow operations after the failed operation to continue ? The rationale is that the operations in the batch may not have dependencies among them. multi should have a non-transaction version --- Key: ZOOKEEPER-1568 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568 Project: ZooKeeper Issue Type: Improvement Reporter: Jimmy Xiang Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483737#comment-13483737 ] Ted Yu commented on ZOOKEEPER-1560: --- I got the following based on the above code snippet: {code} Index: src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java === --- src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java (revision 1401904) +++ src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java (working copy) @@ -111,18 +111,18 @@ cnxn.sendThread.clientTunneledAuthenticationInProgress()); if (p != null) { -outgoingQueue.removeFirstOccurrence(p); updateLastSend(); if ((p.requestHeader != null) (p.requestHeader.getType() != OpCode.ping) (p.requestHeader.getType() != OpCode.auth)) { p.requestHeader.setXid(cnxn.getXid()); } -p.createBB(); +if (p.bb == null) p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); if (!pbb.hasRemaining()) { sentCount++; +outgoingQueue.removeFirstOccurrence(p); if (p.requestHeader != null p.requestHeader.getType() != OpCode.ping p.requestHeader.getType() != OpCode.auth) { @@ -141,8 +141,12 @@ synchronized(pendingQueue) { pendingQueue.addAll(pending); } - } +if (outgoingQueue.isEmpty()) { + disableWrite(); +} else { +enableWrite(); +} } private Packet findSendablePacket(LinkedListPacket outgoingQueue, {code} I still saw testLargeNodeData fail: {code} Testcase: testLargeNodeData took 0.714 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /large org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /large at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61) {code} Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483823#comment-13483823 ] Ted Yu commented on ZOOKEEPER-1560: --- I left some minor comments on review board. Nice work, Skye. Zookeeper client hangs on creation of large nodes - Key: ZOOKEEPER-1560 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560 Project: ZooKeeper Issue Type: Bug Components: java client Affects Versions: 3.4.4, 3.5.0 Reporter: Igor Motov Assignee: Ted Yu Fix For: 3.5.0, 3.4.5 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt, ZOOKEEPER-1560-v8.patch To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569138#comment-13569138 ] Ted Yu commented on ZOOKEEPER-1624: --- The fix would be backported to 3.4, right ? PrepRequestProcessor abort multi-operation incorrectly -- Key: ZOOKEEPER-1624 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624 Project: ZooKeeper Issue Type: Bug Components: server Reporter: Thawan Kooburat Assignee: Thawan Kooburat Priority: Critical Labels: zk-review Fix For: 3.5.0 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1495) ZK client hangs when using a function not available on the server.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576219#comment-13576219 ] Ted Yu commented on ZOOKEEPER-1495: --- What about 1495.br33.v3.patch ? It would really useful for clients to see whether certain operation is supported. ZK client hangs when using a function not available on the server. -- Key: ZOOKEEPER-1495 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1495 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.2, 3.3.5 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 3.5.0, 3.4.6 Attachments: 1495.br33.v3.patch, ZOOKEEPER-1495.2.patch, ZOOKEEPER-1495_branch34.patch, ZOOKEEPER-1495.patch This happens for example when using zk#multi with a 3.4 client but a 3.3 server. The issue seems to be on the server side: the servers drops the packets with an unknown OpCode in ZooKeeperServer#submitRequest {noformat} public void submitRequest(Request si) { // snip try { touch(si.cnxn); boolean validpacket = Request.isValid(si.type); // === Check on case OpCode.* if (validpacket) { // snip } else { LOG.warn(Dropping packet at server of type + si.type); // if invalid packet drop the packet. } } catch (MissingSessionException e) { if (LOG.isDebugEnabled()) { LOG.debug(Dropping request: + e.getMessage()); } } } {noformat} The solution discussed in ZOOKEEPER-1381 would be to get an exception on the client side then close the session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1665) Support recursive deletion in multi
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825047#comment-13825047 ] Ted Yu commented on ZOOKEEPER-1665: --- bq. and return without rollback the previously committed operations. If this limitation cannot be lifted with reasonable effort, I can resolve this JIRA. Support recursive deletion in multi --- Key: ZOOKEEPER-1665 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665 Project: ZooKeeper Issue Type: New Feature Reporter: Ted Yu Use case in HBase is that we need to recursively delete multiple subtrees: {code} ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode); ZKUtil.deleteChildrenRecursively(watcher, reachedZnode); ZKUtil.deleteChildrenRecursively(watcher, abortZnode); {code} To achieve high consistency, it is desirable to use multi for the above operations. This JIRA adds support for recursive deletion in multi. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
Ted Yu created ZOOKEEPER-1859: - Summary: pwriter should be closed in NIOServerCnxn#checkFourLetterWord() Key: ZOOKEEPER-1859 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
Ted Yu created ZOOKEEPER-1861: - Summary: ConcurrentHashMap isn't used properly in QuorumCnxManager Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Minor queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1861: -- Attachment: zookeeper-1861-v1.txt Sure. Here is the patch. ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1861: -- Attachment: zookeeper-1861-v2.txt Patch v2 addresses Michi's comments ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870966#comment-13870966 ] Ted Yu commented on ZOOKEEPER-1861: --- [~michim]: Can you take a look at patch v2 ? ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871657#comment-13871657 ] Ted Yu commented on ZOOKEEPER-1861: --- To avoid allocating extra ArrayBlockingQueue, I am thinking of the following: * create a singleton ArrayBlockingQueue which serves as marker * if queueSendMap.putIfAbsent(sid, singleton) returns null, create the real ArrayBlockingQueue, named bq, and call queueSendMap.replace(sid, bq) * if queueSendMap.putIfAbsent(sid, singleton) returns non-null value, check whether the return is singleton, if so, wait till queueSendMap.get(sid) returns a value which is not singleton. ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872288#comment-13872288 ] Ted Yu commented on ZOOKEEPER-1861: --- The above suggestion would involve more complex logic. Maybe the first two hunks in patch v2 can be integrated first ? ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897335#comment-13897335 ] Ted Yu commented on ZOOKEEPER-1861: --- Further review on this would be appreciated. ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1861: -- Attachment: zookeeper-1861-v3.txt How about patch v3 ? ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt, zookeeper-1861-v3.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898727#comment-13898727 ] Ted Yu commented on ZOOKEEPER-1861: --- bq. I prefer patch v2 I agree. Patch v3 basically makes the map a HashMap. bq. then create, and put if not absent I guess you meant 'put if absent' The chance of extra allocation should be low. ConcurrentHashMap isn't used properly in QuorumCnxManager - Key: ZOOKEEPER-1861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt, zookeeper-1861-v3.txt queueSendMap is a ConcurrentHashMap. At line 210: {code} if (!queueSendMap.containsKey(sid)) { queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer( SEND_CAPACITY)); {code} By the time control enters if block, there may be another concurrent put with same sid to the ConcurrentHashMap. putIfAbsent() should be used. Similar issue occurs at line 307 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1665) Support recursive deletion in multi
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931718#comment-13931718 ] Ted Yu commented on ZOOKEEPER-1665: --- The snippet looks good. Patch is welcome. Thanks Support recursive deletion in multi --- Key: ZOOKEEPER-1665 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665 Project: ZooKeeper Issue Type: New Feature Reporter: Ted Yu Use case in HBase is that we need to recursively delete multiple subtrees: {code} ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode); ZKUtil.deleteChildrenRecursively(watcher, reachedZnode); ZKUtil.deleteChildrenRecursively(watcher, abortZnode); {code} To achieve high consistency, it is desirable to use multi for the above operations. This JIRA adds support for recursive deletion in multi. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (ZOOKEEPER-1665) Support recursive deletion in multi
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved ZOOKEEPER-1665. --- Resolution: Won't Fix Support recursive deletion in multi --- Key: ZOOKEEPER-1665 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665 Project: ZooKeeper Issue Type: New Feature Reporter: Ted Yu Use case in HBase is that we need to recursively delete multiple subtrees: {code} ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode); ZKUtil.deleteChildrenRecursively(watcher, reachedZnode); ZKUtil.deleteChildrenRecursively(watcher, abortZnode); {code} To achieve high consistency, it is desirable to use multi for the above operations. This JIRA adds support for recursive deletion in multi. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (ZOOKEEPER-2064) Prevent resource leak in various classes
Ted Yu created ZOOKEEPER-2064: - Summary: Prevent resource leak in various classes Key: ZOOKEEPER-2064 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu In various classes, there is potential resource leak. e.g. LogIterator / RandomAccessFileReader is not closed upon return from the method. Corresponding close() should be called to prevent resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2064) Prevent resource leak in various classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2064: -- Attachment: 2064-v1.txt Tentative patch. Prevent resource leak in various classes Key: ZOOKEEPER-2064 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Attachments: 2064-v1.txt In various classes, there is potential resource leak. e.g. LogIterator / RandomAccessFileReader is not closed upon return from the method. Corresponding close() should be called to prevent resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2064) Prevent resource leak in various classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2064: -- Attachment: 2064-v2.txt patch v2 is based on latest trunk. Prevent resource leak in various classes Key: ZOOKEEPER-2064 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Critical Attachments: 2064-v1.txt, 2064-v2.txt In various classes, there is potential resource leak. e.g. LogIterator / RandomAccessFileReader is not closed upon return from the method. Corresponding close() should be called to prevent resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208924#comment-14208924 ] Ted Yu commented on ZOOKEEPER-2064: --- I ran the failed tests locally. ReconfigRecoveryTest#testCurrentObserverIsParticipantInNewConfig fails with or without my patch. Prevent resource leak in various classes Key: ZOOKEEPER-2064 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Critical Attachments: 2064-v1.txt, 2064-v2.txt In various classes, there is potential resource leak. e.g. LogIterator / RandomAccessFileReader is not closed upon return from the method. Corresponding close() should be called to prevent resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208931#comment-14208931 ] Ted Yu commented on ZOOKEEPER-2064: --- Correction: ReconfigRecoveryTest#testCurrentServersAreObserversInNextConfig failed with patch ReconfigRecoveryTest#testCurrentObserverIsParticipantInNewConfig failed without patch. Prevent resource leak in various classes Key: ZOOKEEPER-2064 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Critical Attachments: 2064-v1.txt, 2064-v2.txt In various classes, there is potential resource leak. e.g. LogIterator / RandomAccessFileReader is not closed upon return from the method. Corresponding close() should be called to prevent resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently
Ted Yu created ZOOKEEPER-2080: - Summary: ReconfigRecoveryTest fails intermittently Key: ZOOKEEPER-2080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080 Project: ZooKeeper Issue Type: Test Reporter: Ted Yu Priority: Minor I got the following test failure on MacBook with trunk code: {code} Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec FAILED waiting for server 2 being up junit.framework.AssertionFailedError: waiting for server 2 being up at org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220142#comment-14220142 ] Ted Yu commented on ZOOKEEPER-2064: --- Thanks Flavio. Prevent resource leak in various classes Key: ZOOKEEPER-2064 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: 2064-v1.txt, 2064-v2.txt, ZOOKEEPER-2064.patch In various classes, there is potential resource leak. e.g. LogIterator / RandomAccessFileReader is not closed upon return from the method. Corresponding close() should be called to prevent resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2105: -- Attachment: zookeeper-2105-v1.patch PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord -- Key: ZOOKEEPER-2105 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: zookeeper-2105-v1.patch {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); {code} pwriter should be closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
Ted Yu created ZOOKEEPER-2105: - Summary: PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord Key: ZOOKEEPER-2105 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); {code} pwriter should be closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272192#comment-14272192 ] Ted Yu commented on ZOOKEEPER-2105: --- NettyServerCnxn#checkFourLetterWord() has similar issue. PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord -- Key: ZOOKEEPER-2105 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); {code} pwriter should be closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved ZOOKEEPER-2080. --- Resolution: Cannot Reproduce ReconfigRecoveryTest fails intermittently - Key: ZOOKEEPER-2080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080 Project: ZooKeeper Issue Type: Sub-task Reporter: Ted Yu Priority: Minor I got the following test failure on MacBook with trunk code: {code} Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec FAILED waiting for server 2 being up junit.framework.AssertionFailedError: waiting for server 2 being up at org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369593#comment-14369593 ] Ted Yu commented on ZOOKEEPER-2080: --- Looks like the test doesn't fail recently. ReconfigRecoveryTest fails intermittently - Key: ZOOKEEPER-2080 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080 Project: ZooKeeper Issue Type: Sub-task Reporter: Ted Yu Priority: Minor I got the following test failure on MacBook with trunk code: {code} Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec FAILED waiting for server 2 being up junit.framework.AssertionFailedError: waiting for server 2 being up at org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2170) Zookeeper is not logging as per the configuration in log4j.properties
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935526#comment-14935526 ] Ted Yu commented on ZOOKEEPER-2170: --- If I am not mistaken, 3.4.6 has this issue as well. When can I expect this to be fixed ? Thanks > Zookeeper is not logging as per the configuration in log4j.properties > - > > Key: ZOOKEEPER-2170 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2170 > Project: ZooKeeper > Issue Type: Bug >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2170-002.patch, ZOOKEEPER-2170-003.patch, > ZOOKEEPER-2170.001.patch > > > In conf/log4j.properties default root logger is > {code} > zookeeper.root.logger=INFO, CONSOLE > {code} > Changing root logger to bellow value or any other value does not change > logging effect > {code} > zookeeper.root.logger=DEBUG, ROLLINGFILE > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2347: -- Attachment: testSplitLogManager.stack Stack trace showing the issue > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Priority: Critical > Attachments: testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with ZOOKEEPER-1907 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
Ted Yu created ZOOKEEPER-2347: - Summary: Deadlock shutting down zookeeper Key: ZOOKEEPER-2347 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.4.7 Reporter: Ted Yu Priority: Critical HBase recently upgraded to zookeeper 3.4.7 In one of the tests, TestSplitLogManager, there is reproducible hang at the end of the test. Below is snippet from stack trace related to zookeeper: {code} "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on condition [0x00011834b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0007c5b8d3a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 nid=0x9513 waiting on condition [0x000118042000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor entry [0x0001170ac000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) - waiting to lock <0x0007c5b62128> (a org.apache.zookeeper.server.ZooKeeperServer) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on condition [0x000117a3] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0007c9b106b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() [0x000108aa1000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x0007c5b66400> (a org.apache.zookeeper.server.SyncRequestProcessor) at java.lang.Thread.join(Thread.java:1281) - locked <0x0007c5b66400> (a org.apache.zookeeper.server.SyncRequestProcessor) at java.lang.Thread.join(Thread.java:1355) at org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) at org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) at org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) - locked <0x0007c5b62128> (a org.apache.zookeeper.server.ZooKeeperServer) at org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) {code} Note the address (0x0007c5b66400) in the last hunk which seems to indicate some form of deadlock. According to Camille Fournier: We made shutdown synchronized. But decrementing the requests is also synchronized and called from a different thread. So yeah, deadlock. This came in with ZOOKEEPER-1907 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062994#comment-15062994 ] Ted Yu commented on ZOOKEEPER-2347: --- Not sure how I can test this with hbase unit test(s). As far as I know, zookeeper still uses ant to build while hbase dependency is expressed through maven. > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with ZOOKEEPER-1907 -- This message was sent by
[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063086#comment-15063086 ] Ted Yu commented on ZOOKEEPER-2347: --- Thanks for the pointer, Chris. I ran TestSplitLogManager after modifying pom.xml twice which passed. Previously the test hung quite reliably on Mac. > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with ZOOKEEPER-1907 -- This message was sent by
[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081602#comment-15081602 ] Ted Yu commented on ZOOKEEPER-2347: --- [~fpj]: Can you review the patch ? > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with ZOOKEEPER-1907 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075556#comment-15075556 ] Ted Yu commented on ZOOKEEPER-2347: --- Rakesh: Thanks for updating the test case. > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with ZOOKEEPER-1907 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089636#comment-15089636 ] Ted Yu commented on ZOOKEEPER-1936: --- [~fpj]: Can you take a look ? Thanks > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089595#comment-15089595 ] Ted Yu commented on ZOOKEEPER-1936: --- We encountered this issue during testing, though intermittently. Can the fix be committed ? [~shralex] [~phunt] > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095073#comment-15095073 ] Ted Yu commented on ZOOKEEPER-2347: --- Assuming there was only test change since I performed validation last year, this should be good to go. > Deadlock shutting down zookeeper > > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.7 >Reporter: Ted Yu >Assignee: Rakesh R >Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, ZOOKEEPER-2347-br-3.4.patch, > ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on > condition [0x00011834b000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 > nid=0x9513 waiting on condition [0x000118042000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor > entry [0x0001170ac000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on > condition [0x000117a3] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() > [0x000108aa1000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x0007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x0007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x0007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v3.patch > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102276#comment-15102276 ] Ted Yu commented on ZOOKEEPER-1936: --- Patch v3 addresses comments from Chris and Rakesh. The same patch can be applied smoothly on branch-3.4 Let me know if separate patch for branch-3.4 should be attached. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v2.patch Alternate patch for consideration. Only throw exception if dataDir doesn't exist and mkdirs() call fails. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123934#comment-15123934 ] Ted Yu commented on ZOOKEEPER-1936: --- Haven't got a chance to reproduce the bug. After some QE fix, hbase un-secure deployment works reliably. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v4.patch Patch v4 addresses Chris' comment above > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v4.patch > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: (was: ZOOKEEPER-1936.v4.patch) > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108068#comment-15108068 ] Ted Yu commented on ZOOKEEPER-1936: --- Previous patch was generated for branch-3.4 Attached patch for trunk. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v3.patch > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108940#comment-15108940 ] Ted Yu commented on ZOOKEEPER-1936: --- https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//testReport/org.apache.zookeeper.test/AsyncHammerTest/testHammer/ doesn't seem to be related to the patch. > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Andrew Purtell >Priority: Minor > Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, > ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value
Ted Yu created ZOOKEEPER-2384: - Summary: Support atomic increment / decrement of znode value Key: ZOOKEEPER-2384 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384 Project: ZooKeeper Issue Type: Improvement Reporter: Ted Yu Use case is to store reference count (integer type) in znode. It is desirable to provide support for atomic increment / decrement of the znode value. Suggestion from Flavio: you can read the znode, keep the version of the znode, update the value, write back conditionally. The condition for the setData operation to succeed is that the version is the same that it read While the above is feasible, developer has to implement retry logic him/herself. It is not easy to combine increment / decrement with other operations using multi. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2384: -- Labels: atomic (was: ) > Support atomic increment / decrement of znode value > --- > > Key: ZOOKEEPER-2384 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Ted Yu > Labels: atomic > > Use case is to store reference count (integer type) in znode. > It is desirable to provide support for atomic increment / decrement of the > znode value. > Suggestion from Flavio: > you can read the znode, keep the version of the znode, update the value, > write back conditionally. The condition for the setData operation to succeed > is that the version is the same that it read > While the above is feasible, developer has to implement retry logic > him/herself. It is not easy to combine increment / decrement with other > operations using multi. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2384: -- Description: Use case is to store reference count (integer type) in znode. It is desirable to provide support for atomic increment / decrement of the znode value. Suggestion from Flavio: {quote} you can read the znode, keep the version of the znode, update the value, write back conditionally. The condition for the setData operation to succeed is that the version is the same that it read {quote} While the above is feasible, developer has to implement retry logic him/herself. It is not easy to combine increment / decrement with other operations using multi. was: Use case is to store reference count (integer type) in znode. It is desirable to provide support for atomic increment / decrement of the znode value. Suggestion from Flavio: you can read the znode, keep the version of the znode, update the value, write back conditionally. The condition for the setData operation to succeed is that the version is the same that it read While the above is feasible, developer has to implement retry logic him/herself. It is not easy to combine increment / decrement with other operations using multi. > Support atomic increment / decrement of znode value > --- > > Key: ZOOKEEPER-2384 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Ted Yu > Labels: atomic > > Use case is to store reference count (integer type) in znode. > It is desirable to provide support for atomic increment / decrement of the > znode value. > Suggestion from Flavio: > {quote} > you can read the znode, keep the version of the znode, update the value, > write back conditionally. The condition for the setData operation to succeed > is that the version is the same that it read > {quote} > While the above is feasible, developer has to implement retry logic > him/herself. It is not easy to combine increment / decrement with other > operations using multi. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550505#comment-15550505 ] Ted Yu commented on ZOOKEEPER-1936: --- Is there anything I can do to move this forward ? > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.4.10, 3.5.3 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2606: -- Priority: Minor (was: Major) > SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception > > > Key: ZOOKEEPER-2606 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > > {code} > LOG.info("Setting authorizedID: " + userNameBuilder); > ac.setAuthorizedID(userNameBuilder.toString()); > } catch (IOException e) { > LOG.error("Failed to set name based on Kerberos authentication > rules."); > } > {code} > On one cluster, we saw the following: > {code} > 2016-10-04 02:18:16,484 - ERROR > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - > Failed to set name based on Kerberos authentication rules. > {code} > It would be helpful if the log contains information about the IOException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
Ted Yu created ZOOKEEPER-2606: - Summary: SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception Key: ZOOKEEPER-2606 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606 Project: ZooKeeper Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu {code} LOG.info("Setting authorizedID: " + userNameBuilder); ac.setAuthorizedID(userNameBuilder.toString()); } catch (IOException e) { LOG.error("Failed to set name based on Kerberos authentication rules."); } {code} On one cluster, we saw the following: {code} 2016-10-04 02:18:16,484 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - Failed to set name based on Kerberos authentication rules. {code} It would be helpful if the log contains information about the IOException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2606: -- Attachment: ZOOKEEPER-2606.v1.patch > SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception > > > Key: ZOOKEEPER-2606 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: ZOOKEEPER-2606.v1.patch > > > {code} > LOG.info("Setting authorizedID: " + userNameBuilder); > ac.setAuthorizedID(userNameBuilder.toString()); > } catch (IOException e) { > LOG.error("Failed to set name based on Kerberos authentication > rules."); > } > {code} > On one cluster, we saw the following: > {code} > 2016-10-04 02:18:16,484 - ERROR > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - > Failed to set name based on Kerberos authentication rules. > {code} > It would be helpful if the log contains information about the IOException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.v5.patch > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.4.10, 3.5.3 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1936: -- Attachment: ZOOKEEPER-1936.branch-3.4.patch > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.4.10, 3.5.3 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613237#comment-15613237 ] Ted Yu commented on ZOOKEEPER-2080: --- Thanks for the effort, Michael. > ReconfigRecoveryTest fails intermittently > - > > Key: ZOOKEEPER-2080 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080 > Project: ZooKeeper > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Michael Han > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, > ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, > ZOOKEEPER-2080.patch, jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, > repro-20150816.log, threaddump.log > > > I got the following test failure on MacBook with trunk code: > {code} > Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec > FAILED > waiting for server 2 being up > junit.framework.AssertionFailedError: waiting for server 2 being up > at > org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638611#comment-15638611 ] Ted Yu commented on ZOOKEEPER-2384: --- Thanks for the suggestion, Nick. > Support atomic increment / decrement of znode value > --- > > Key: ZOOKEEPER-2384 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Ted Yu > Labels: atomic > > Use case is to store reference count (integer type) in znode. > It is desirable to provide support for atomic increment / decrement of the > znode value. > Suggestion from Flavio: > {quote} > you can read the znode, keep the version of the znode, update the value, > write back conditionally. The condition for the setData operation to succeed > is that the version is the same that it read > {quote} > While the above is feasible, developer has to implement retry logic > him/herself. It is not easy to combine increment / decrement with other > operations using multi. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2606: -- Labels: security (was: ) > SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception > > > Key: ZOOKEEPER-2606 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Labels: security > Attachments: ZOOKEEPER-2606.v1.patch > > > {code} > LOG.info("Setting authorizedID: " + userNameBuilder); > ac.setAuthorizedID(userNameBuilder.toString()); > } catch (IOException e) { > LOG.error("Failed to set name based on Kerberos authentication > rules."); > } > {code} > On one cluster, we saw the following: > {code} > 2016-10-04 02:18:16,484 - ERROR > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - > Failed to set name based on Kerberos authentication rules. > {code} > It would be helpful if the log contains information about the IOException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822028#comment-15822028 ] Ted Yu commented on ZOOKEEPER-2664: --- https://github.com/apache/zookeeper/pull/149 > ClientPortBindTest#testBindByAddress may fail due to "No such device" > exception > --- > > Key: ZOOKEEPER-2664 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 > Project: ZooKeeper > Issue Type: Test >Affects Versions: 3.4.6 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: ZOOKEEPER-2664.v1.txt > > > Saw the following in a recent run: > {code} > Stacktrace > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Standard Output > 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING > testBindByAddress > 2017-01-12 23:20:43,795 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD > testBindByAddress > 2017-01-12 23:20:43,799 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED > testBindByAddress > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) > {code} > Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822029#comment-15822029 ] Ted Yu commented on ZOOKEEPER-2664: --- [~praste]: Looks like you mistakenly entered ZOOKEEPER-2664 which is not for log4j. > ClientPortBindTest#testBindByAddress may fail due to "No such device" > exception > --- > > Key: ZOOKEEPER-2664 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 > Project: ZooKeeper > Issue Type: Test >Affects Versions: 3.4.6 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: ZOOKEEPER-2664.v1.txt > > > Saw the following in a recent run: > {code} > Stacktrace > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Standard Output > 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING > testBindByAddress > 2017-01-12 23:20:43,795 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD > testBindByAddress > 2017-01-12 23:20:43,799 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED > testBindByAddress > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) > {code} > Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2664: -- Comment: was deleted (was: https://github.com/apache/zookeeper/pull/149) > ClientPortBindTest#testBindByAddress may fail due to "No such device" > exception > --- > > Key: ZOOKEEPER-2664 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 > Project: ZooKeeper > Issue Type: Test >Affects Versions: 3.4.6 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: ZOOKEEPER-2664.v1.txt > > > Saw the following in a recent run: > {code} > Stacktrace > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Standard Output > 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING > testBindByAddress > 2017-01-12 23:20:43,795 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD > testBindByAddress > 2017-01-12 23:20:43,799 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED > testBindByAddress > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) > {code} > Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821881#comment-15821881 ] Ted Yu commented on ZOOKEEPER-2664: --- Since ZOOKEEPER-2395 didn't propose patch, I think we can proceed with patch review here. > ClientPortBindTest#testBindByAddress may fail due to "No such device" > exception > --- > > Key: ZOOKEEPER-2664 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 > Project: ZooKeeper > Issue Type: Test >Affects Versions: 3.4.6 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: ZOOKEEPER-2664.v1.txt > > > Saw the following in a recent run: > {code} > Stacktrace > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Standard Output > 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING > testBindByAddress > 2017-01-12 23:20:43,795 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD > testBindByAddress > 2017-01-12 23:20:43,799 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED > testBindByAddress > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) > {code} > Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-2664: -- Attachment: ZOOKEEPER-2664.v1.txt > ClientPortBindTest#testBindByAddress may fail due to "No such device" > exception > --- > > Key: ZOOKEEPER-2664 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 > Project: ZooKeeper > Issue Type: Test >Affects Versions: 3.4.6 >Reporter: Ted Yu > Attachments: ZOOKEEPER-2664.v1.txt > > > Saw the following in a recent run: > {code} > Stacktrace > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Standard Output > 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING > testBindByAddress > 2017-01-12 23:20:43,795 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD > testBindByAddress > 2017-01-12 23:20:43,799 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED > testBindByAddress > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) > {code} > Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
Ted Yu created ZOOKEEPER-2664: - Summary: ClientPortBindTest#testBindByAddress may fail due to "No such device" exception Key: ZOOKEEPER-2664 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 Project: ZooKeeper Issue Type: Test Affects Versions: 3.4.6 Reporter: Ted Yu Saw the following in a recent run: {code} Stacktrace java.net.SocketException: No such device at java.net.NetworkInterface.isLoopback0(Native Method) at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) at org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) Standard Output 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING testBindByAddress 2017-01-12 23:20:43,795 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD testBindByAddress 2017-01-12 23:20:43,799 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED testBindByAddress java.net.SocketException: No such device at java.net.NetworkInterface.isLoopback0(Native Method) at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) at org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) {code} Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned ZOOKEEPER-2664: - Assignee: Ted Yu > ClientPortBindTest#testBindByAddress may fail due to "No such device" > exception > --- > > Key: ZOOKEEPER-2664 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664 > Project: ZooKeeper > Issue Type: Test >Affects Versions: 3.4.6 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: ZOOKEEPER-2664.v1.txt > > > Saw the following in a recent run: > {code} > Stacktrace > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > Standard Output > 2017-01-12 23:20:43,792 [myid:] - INFO [main:ZKTestCase$1@50] - STARTING > testBindByAddress > 2017-01-12 23:20:43,795 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD > testBindByAddress > 2017-01-12 23:20:43,799 [myid:] - INFO > [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED > testBindByAddress > java.net.SocketException: No such device > at java.net.NetworkInterface.isLoopback0(Native Method) > at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390) > at > org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030) > {code} > Proposed fix is to catch exception from isLoopback() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated ZOOKEEPER-1859: -- Description: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd was: {code} final PrintWriter pwriter = new PrintWriter( new BufferedWriter(new SendBufferWriter())); ... } else if (len == telnetCloseCmd) { cleanupWriterSocket(null); return true; } {code} pwriter should be closed in case of telnetCloseCmd > pwriter should be closed in NIOServerCnxn#checkFourLetterWord() > --- > > Key: ZOOKEEPER-1859 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > final PrintWriter pwriter = new PrintWriter( > new BufferedWriter(new SendBufferWriter())); > ... > } else if (len == telnetCloseCmd) { > cleanupWriterSocket(null); > return true; > } > {code} > pwriter should be closed in case of telnetCloseCmd -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629504#comment-16629504 ] Ted Yu commented on ZOOKEEPER-1936: --- Can you outline how you plan to fix ? thanks > Server exits when unable to create data directory due to race > -- > > Key: ZOOKEEPER-1936 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6, 3.5.0 >Reporter: Harald Musum >Assignee: Ted Yu >Priority: Minor > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, > ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, > ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch > > > We sometime see issues with ZooKeeper server not starting and seeing this > error in the log: > [2014-05-27 09:29:48.248] ERROR : - > .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception, > exiting abnormally\nexception=\njava.io.IOException: Unable to create data > directory /home/y/var/zookeeper/version-2\n\tat > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t > [...] > Stack trace from JVM gives this: > "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable > [0x7f55d7dc7000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68) > at > org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable > [0x7f55d7ed8000] >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createDirectory(Native Method) > at java.io.File.mkdir(File.java:1310) > at java.io.File.mkdirs(File.java:1337) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) > [...] > So it seems that when autopurge is used (as it is in our case), it might > happen at the same time as starting the server itself. In FileTxnSnapLog() it > will check if the directory exists and create it if not. These two tasks do > this at the same time, and mkdir fails and server exits the JVM. -- This message was sent by Atlassian JIRA (v7.6.3#76005)