[jira] [Commented] (ZOOKEEPER-1381) Add a method to get the zookeeper server version from the client

2012-06-25 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400750#comment-13400750
 ] 

Zhihong Ted Yu commented on ZOOKEEPER-1381:
---

Looks like UnimplementedRequestProcessor().processRequest() can be served by a 
singleton UnimplementedRequestProcessor.

 Add a method to get the zookeeper server version from the client
 

 Key: ZOOKEEPER-1381
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1381
 Project: ZooKeeper
  Issue Type: Improvement
  Components: c client, documentation, java client, server
Affects Versions: 3.4.2
 Environment: all
Reporter: nkeywal
Priority: Minor
  Labels: newbie
 Attachments: 1381.br33.v3.patch


 Zookeeper client API is designed to be server version agnostic as much as 
 possible, so we can have new clients with old servers (or the opposite). But 
 there is today no simple way for a client to know what's the server version. 
 This would be very useful in order to;
 - check the compatibility (ex: 'multi' implementation available since 3.4 
 while 3.4 clients API supports 3.3 servers as well)
 - have different implementation depending on the server functionalities
 A workaround (proposed by Mahadev Konar) is do echo stat | nc hostname 
 clientport and parse the output to get the version. The output is, for 
 example:
 ---
 Zookeeper version: 3.4.2--1, built on 01/30/2012 17:43 GMT
 Clients:
  /127.0.0.1:54951[0](queued=0,recved=1,sent=0)
 Latency min/avg/max: 0/0/0
 Received: 1
 Sent: 0
 Outstanding: 0
 Zxid: 0x50001
 Mode: follower
 Node count: 7
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1560:
--

Attachment: zookeeper-1560-v5.txt

From 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1215//testReport/org.apache.zookeeper.test/ClientTest/testLargeNodeData/
 :
{code}
2012-10-12 14:10:50,042 [myid:] - WARN  
[main-SendThread(localhost:11221):ClientCnxn$SendThread@1089] - Session 
0x13a555031cf for server localhost/127.0.0.1:11221, unexpected error, 
closing socket connection and attempting reconnect
java.io.IOException: Couldn't write 2000 bytes, 1152 bytes written
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:142)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2012-10-12 14:10:50,044 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@349] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x13a555031cf, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:662)
{code}
Patch v5 adds more information to exception message.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1560:
--

Attachment: zookeeper-1560-v6.txt

Patch v6 changes the condition for raising IOE: if there is no progress between 
successive sock.write() calls.

I guess socket's output buffer might be a limiting factor as to the number of 
bytes written in a particular sock.write() call.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1560:
--

Attachment: zookeeper-1560-v7.txt

Patch v7 changes the IOE to a warning.
Let's see if the test is able to make further progress.

I wonder whether 77152 bytes would be big enough for most use cases.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475146#comment-13475146
 ] 

Ted Yu commented on ZOOKEEPER-1560:
---

Good news was that patch v7 passed.
Not so good news was that I didn't find any occurrence of the warning message I 
added in v7.

Essentially patch v7 is the same as patch v2 - we shouldn't bail if a single 
sock.write() call didn't make progress.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2012-10-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476591#comment-13476591
 ] 

Ted Yu commented on ZOOKEEPER-107:
--

I wonder if patch v7 from ZOOKEEPER-1560 would help prevent such test failure.
Will combine the two patches and run these two tests locally.

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, 
 ZOOKEEPER-107-14-Oct.patch, ZOOKEEPER-107-15-Oct.patch, 
 ZOOKEEPER-107-15-Oct-ver1.patch, ZOOKEEPER-107-15-Oct-ver2.patch, 
 ZOOKEEPER-107-15-Oct-ver3.patch, ZOOKEEPER-107-1-Mar.patch, 
 ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, 
 ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, 
 ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-Feb.patch, 
 ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-3-Oct.patch, 
 ZOOKEEPER-107-Aug-20.patch, ZOOKEEPER-107-Aug-20-ver1.patch, 
 ZOOKEEPER-107-Aug-25.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, 
 zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, 
 zoo_replicated1.cfg, zoo_replicated1.members


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

2012-10-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476702#comment-13476702
 ] 

Ted Yu commented on ZOOKEEPER-107:
--

ReconfigTest hung with combined patch.
Test output is quite long (25MB).
I am not familiar with ReconfigTest so am not sure what to look for in test 
output.
{code}
LOG.warn(Couldn't write  + expectedSize +  bytes, 
{code}
I verified that the above log which I added in patch v7 for ZOOKEEPER-1560 
didn't appear in test output.

 Allow dynamic changes to server cluster membership
 --

 Key: ZOOKEEPER-107
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
Assignee: Alexander Shraer
 Fix For: 3.5.0

 Attachments: SimpleAddition.rtf, zkreconfig-usenixatc-final.pdf, 
 ZOOKEEPER-107-14-Oct.patch, ZOOKEEPER-107-15-Oct.patch, 
 ZOOKEEPER-107-15-Oct-ver1.patch, ZOOKEEPER-107-15-Oct-ver2.patch, 
 ZOOKEEPER-107-15-Oct-ver3.patch, ZOOKEEPER-107-1-Mar.patch, 
 ZOOKEEPER-107-20-July.patch, ZOOKEEPER-107-21-July.patch, 
 ZOOKEEPER-107-22-Apr.patch, ZOOKEEPER-107-23-SEP.patch, 
 ZOOKEEPER-107-28-Feb.patch, ZOOKEEPER-107-28-Feb.patch, 
 ZOOKEEPER-107-29-Feb.patch, ZOOKEEPER-107-3-Oct.patch, 
 ZOOKEEPER-107-Aug-20.patch, ZOOKEEPER-107-Aug-20-ver1.patch, 
 ZOOKEEPER-107-Aug-25.patch, zookeeper-3.4.0.jar, zookeeper-dev-fatjar.jar, 
 zookeeper-reconfig-sep11.patch, zookeeper-reconfig-sep12.patch, 
 zoo_replicated1.cfg, zoo_replicated1.members


 Currently cluster membership is statically defined, adding/removing hosts 
 to/from the server cluster dynamically needs to be supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483597#comment-13483597
 ] 

Ted Yu commented on ZOOKEEPER-1560:
---

Looking at createBB(), upon exit the field bb wouldn't be null.
I wonder why p.createBB() is enclosed in the if (p.bb != null) block above ?

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483601#comment-13483601
 ] 

Ted Yu commented on ZOOKEEPER-1560:
---

bq. similar to what it was before - write as much as possible and then use the 
selector to wait for the socket to become writeable again
I looked at svn log for ClientCnxnSocketNIO.java back to 2011-04-12 and didn't 
seem to find the above change.
FYI

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version

2012-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483618#comment-13483618
 ] 

Ted Yu commented on ZOOKEEPER-1568:
---

bq. A multi will case one new snapshot/log to be generated
I guess you meant 'cause' above.
bq. but there was no guarantee they'd all succeed/fail.
I think we need to formalize how success / failure status for individual 
operations in this new multi API should be delivered back to client.

 multi should have a non-transaction version
 ---

 Key: ZOOKEEPER-1568
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Jimmy Xiang

 Currently multi is transactional, i.e. all or none.  However, sometimes, we 
 don't want that.  We want all operations to be executed.  Even some 
 operation(s) fails, it is ok. We just need to know the result of each 
 operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1568) multi should have a non-transaction version

2012-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483635#comment-13483635
 ] 

Ted Yu commented on ZOOKEEPER-1568:
---

bq. it aborts on the first op that fails and rolls back
Should we allow operations after the failed operation to continue ?
The rationale is that the operations in the batch may not have dependencies 
among them.

 multi should have a non-transaction version
 ---

 Key: ZOOKEEPER-1568
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1568
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Jimmy Xiang

 Currently multi is transactional, i.e. all or none.  However, sometimes, we 
 don't want that.  We want all operations to be executed.  Even some 
 operation(s) fails, it is ok. We just need to know the result of each 
 operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483737#comment-13483737
 ] 

Ted Yu commented on ZOOKEEPER-1560:
---

I got the following based on the above code snippet:
{code}
Index: src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java
===
--- src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java (revision 
1401904)
+++ src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java (working copy)
@@ -111,18 +111,18 @@
 
cnxn.sendThread.clientTunneledAuthenticationInProgress());
 
 if (p != null) {
-outgoingQueue.removeFirstOccurrence(p);
 updateLastSend();
 if ((p.requestHeader != null) 
 (p.requestHeader.getType() != OpCode.ping) 
 (p.requestHeader.getType() != OpCode.auth)) {
 p.requestHeader.setXid(cnxn.getXid());
 }
-p.createBB();
+if (p.bb == null) p.createBB();
 ByteBuffer pbb = p.bb;
 sock.write(pbb);
 if (!pbb.hasRemaining()) {
 sentCount++;
+outgoingQueue.removeFirstOccurrence(p);
 if (p.requestHeader != null
  p.requestHeader.getType() != OpCode.ping
  p.requestHeader.getType() != OpCode.auth) {
@@ -141,8 +141,12 @@
 synchronized(pendingQueue) {
 pendingQueue.addAll(pending);
 }
-
 }
+if (outgoingQueue.isEmpty()) {
+  disableWrite();
+} else {
+enableWrite();
+}
 }
 
 private Packet findSendablePacket(LinkedListPacket outgoingQueue,
{code}
I still saw testLargeNodeData fail:
{code}
Testcase: testLargeNodeData took 0.714 sec
  Caused an ERROR
KeeperErrorCode = ConnectionLoss for /large
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /large
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
  at org.apache.zookeeper.test.ClientTest.testLargeNodeData(ClientTest.java:61)
{code}

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1560) Zookeeper client hangs on creation of large nodes

2012-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483823#comment-13483823
 ] 

Ted Yu commented on ZOOKEEPER-1560:
---

I left some minor comments on review board.
Nice work, Skye.

 Zookeeper client hangs on creation of large nodes
 -

 Key: ZOOKEEPER-1560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1560
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.4.4, 3.5.0
Reporter: Igor Motov
Assignee: Ted Yu
 Fix For: 3.5.0, 3.4.5

 Attachments: ZOOKEEPER-1560.patch, zookeeper-1560-v1.txt, 
 zookeeper-1560-v2.txt, zookeeper-1560-v3.txt, zookeeper-1560-v4.txt, 
 zookeeper-1560-v5.txt, zookeeper-1560-v6.txt, zookeeper-1560-v7.txt, 
 ZOOKEEPER-1560-v8.patch


 To reproduce, try creating a node with 0.5M of data using java client. The 
 test will hang waiting for a response from the server. See the attached patch 
 for the test that reproduces the issue.
 It seems that ZOOKEEPER-1437 introduced a few issues to 
 {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from 
 sending large packets that require several invocations of 
 {{SocketChannel.write}} to complete. The first issue is that the call to 
 {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue 
 even if the packet wasn't completely sent yet.  It looks to me that this call 
 should be moved under {{if (!pbb.hasRemaining())}} The second issue is that 
 {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which 
 confuses {{SocketChannel.write}}. And the third issue is caused by extra 
 calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse 
 the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1624) PrepRequestProcessor abort multi-operation incorrectly

2013-02-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569138#comment-13569138
 ] 

Ted Yu commented on ZOOKEEPER-1624:
---

The fix would be backported to 3.4, right ?

 PrepRequestProcessor abort multi-operation incorrectly
 --

 Key: ZOOKEEPER-1624
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1624
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Reporter: Thawan Kooburat
Assignee: Thawan Kooburat
Priority: Critical
  Labels: zk-review
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1624.patch, ZOOKEEPER-1624.patch


 We found this issue when trying to issue multiple instances of the following 
 multi-op concurrently
 multi {
 1. create sequential node /a- 
 2. create node /b
 }
 The expected result is that only the first multi-op request should success 
 and the rest of request should fail because /b is already exist
 However, the reported result is that the subsequence multi-op failed because 
 of sequential node creation failed which is not possible.
 Below is the return code for each sub-op when issuing 3 instances of the 
 above multi-op asynchronously
 1. ZOK, ZOK
 2. ZOK, ZNODEEXISTS,
 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,
 When I added more debug log. The cause is that PrepRequestProcessor rollback 
 outstandingChanges of the second multi-op incorrectly causing sequential node 
 name generation to be incorrect. Below is the sequential node name generated 
 by PrepRequestProcessor
 1. create /a-0001
 2. create /a-0003
 3. create /a-0001
 The bug is getPendingChanges() method. In failed to copied ChangeRecord for 
 the parent node (/).  So rollbackPendingChanges() cannot restore the right 
 previous change record of the parent node when aborting the second multi-op
 The impact of this bug is that sequential node creation on the same parent 
 node may fail until the previous one is committed. I am not sure if there is 
 other implication or not.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1495) ZK client hangs when using a function not available on the server.

2013-02-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576219#comment-13576219
 ] 

Ted Yu commented on ZOOKEEPER-1495:
---

What about 1495.br33.v3.patch ?
It would really useful for clients to see whether certain operation is 
supported.

 ZK client hangs when using a function not available on the server.
 --

 Key: ZOOKEEPER-1495
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1495
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.2, 3.3.5
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 3.5.0, 3.4.6

 Attachments: 1495.br33.v3.patch, ZOOKEEPER-1495.2.patch, 
 ZOOKEEPER-1495_branch34.patch, ZOOKEEPER-1495.patch


 This happens for example when using zk#multi with a 3.4 client but a 3.3 
 server.
 The issue seems to be on the server side: the servers drops the packets with 
 an unknown OpCode in ZooKeeperServer#submitRequest
 {noformat}
 public void submitRequest(Request si) {
 // snip
 try {
 touch(si.cnxn);
 boolean validpacket = Request.isValid(si.type); // === Check on case 
 OpCode.*
 if (validpacket) {
 // snip
 } else {
 LOG.warn(Dropping packet at server of type  + si.type);
 // if invalid packet drop the packet.
 }
 } catch (MissingSessionException e) {
 if (LOG.isDebugEnabled()) {
 LOG.debug(Dropping request:  + e.getMessage());
 }
 }
 }
 {noformat}
 The solution discussed in ZOOKEEPER-1381 would be to get an exception on the 
 client side then  close the session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1665) Support recursive deletion in multi

2013-11-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825047#comment-13825047
 ] 

Ted Yu commented on ZOOKEEPER-1665:
---

bq. and return without rollback the previously committed operations.

If this limitation cannot be lifted with reasonable effort, I can resolve this 
JIRA.

 Support recursive deletion in multi
 ---

 Key: ZOOKEEPER-1665
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Ted Yu

 Use case in HBase is that we need to recursively delete multiple subtrees:
 {code}
 ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode);
 ZKUtil.deleteChildrenRecursively(watcher, reachedZnode);
 ZKUtil.deleteChildrenRecursively(watcher, abortZnode);
 {code}
 To achieve high consistency, it is desirable to use multi for the above 
 operations.
 This JIRA adds support for recursive deletion in multi.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2014-01-11 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-1859:
-

 Summary: pwriter should be closed in 
NIOServerCnxn#checkFourLetterWord()
 Key: ZOOKEEPER-1859
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-11 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-1861:
-

 Summary: ConcurrentHashMap isn't used properly in QuorumCnxManager
 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


queueSendMap is a ConcurrentHashMap.
At line 210:
{code}
if (!queueSendMap.containsKey(sid)) {
queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
SEND_CAPACITY));
{code}
By the time control enters if block, there may be another concurrent put with 
same sid to the ConcurrentHashMap.
putIfAbsent() should be used.

Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1861:
--

Attachment: zookeeper-1861-v1.txt

Sure.

Here is the patch.

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1861:
--

Attachment: zookeeper-1861-v2.txt

Patch v2 addresses Michi's comments

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870966#comment-13870966
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

[~michim]:
Can you take a look at patch v2 ?

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871657#comment-13871657
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

To avoid allocating extra ArrayBlockingQueue, I am thinking of the following:
* create a singleton ArrayBlockingQueue which serves as marker
* if queueSendMap.putIfAbsent(sid, singleton) returns null, create the real 
ArrayBlockingQueue, named bq, and call queueSendMap.replace(sid, bq)
* if queueSendMap.putIfAbsent(sid, singleton) returns non-null value, check 
whether the return is singleton, if so, wait till queueSendMap.get(sid) returns 
a value which is not singleton.

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-01-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872288#comment-13872288
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

The above suggestion would involve more complex logic.

Maybe the first two hunks in patch v2 can be integrated first ?

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-02-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897335#comment-13897335
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

Further review on this would be appreciated.

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-02-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1861:
--

Attachment: zookeeper-1861-v3.txt

How about patch v3 ?

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt, 
 zookeeper-1861-v3.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1861) ConcurrentHashMap isn't used properly in QuorumCnxManager

2014-02-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898727#comment-13898727
 ] 

Ted Yu commented on ZOOKEEPER-1861:
---

bq. I prefer patch v2

I agree.

Patch v3 basically makes the map a HashMap.

bq. then create, and put if not absent

I guess you meant 'put if absent'

The chance of extra allocation should be low.

 ConcurrentHashMap isn't used properly in QuorumCnxManager
 -

 Key: ZOOKEEPER-1861
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1861
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: zookeeper-1861-v1.txt, zookeeper-1861-v2.txt, 
 zookeeper-1861-v3.txt


 queueSendMap is a ConcurrentHashMap.
 At line 210:
 {code}
 if (!queueSendMap.containsKey(sid)) {
 queueSendMap.put(sid, new ArrayBlockingQueueByteBuffer(
 SEND_CAPACITY));
 {code}
 By the time control enters if block, there may be another concurrent put with 
 same sid to the ConcurrentHashMap.
 putIfAbsent() should be used.
 Similar issue occurs at line 307 as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1665) Support recursive deletion in multi

2014-03-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931718#comment-13931718
 ] 

Ted Yu commented on ZOOKEEPER-1665:
---

The snippet looks good. 
Patch is welcome. 

Thanks

 Support recursive deletion in multi
 ---

 Key: ZOOKEEPER-1665
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Ted Yu

 Use case in HBase is that we need to recursively delete multiple subtrees:
 {code}
 ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode);
 ZKUtil.deleteChildrenRecursively(watcher, reachedZnode);
 ZKUtil.deleteChildrenRecursively(watcher, abortZnode);
 {code}
 To achieve high consistency, it is desirable to use multi for the above 
 operations.
 This JIRA adds support for recursive deletion in multi.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (ZOOKEEPER-1665) Support recursive deletion in multi

2014-04-02 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved ZOOKEEPER-1665.
---

Resolution: Won't Fix

 Support recursive deletion in multi
 ---

 Key: ZOOKEEPER-1665
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1665
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Ted Yu

 Use case in HBase is that we need to recursively delete multiple subtrees:
 {code}
 ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode);
 ZKUtil.deleteChildrenRecursively(watcher, reachedZnode);
 ZKUtil.deleteChildrenRecursively(watcher, abortZnode);
 {code}
 To achieve high consistency, it is desirable to use multi for the above 
 operations.
 This JIRA adds support for recursive deletion in multi.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-10-21 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2064:
-

 Summary: Prevent resource leak in various classes
 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu


In various classes, there is potential resource leak.
e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
method.

Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-10-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2064:
--
Attachment: 2064-v1.txt

Tentative patch.

 Prevent resource leak in various classes
 

 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: 2064-v1.txt


 In various classes, there is potential resource leak.
 e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
 method.
 Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2064:
--
Attachment: 2064-v2.txt

patch v2 is based on latest trunk.

 Prevent resource leak in various classes
 

 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Critical
 Attachments: 2064-v1.txt, 2064-v2.txt


 In various classes, there is potential resource leak.
 e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
 method.
 Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208924#comment-14208924
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

I ran the failed tests locally.
ReconfigRecoveryTest#testCurrentObserverIsParticipantInNewConfig fails with or 
without my patch.

 Prevent resource leak in various classes
 

 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Critical
 Attachments: 2064-v1.txt, 2064-v2.txt


 In various classes, there is potential resource leak.
 e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
 method.
 Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208931#comment-14208931
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

Correction:
ReconfigRecoveryTest#testCurrentServersAreObserversInNextConfig failed with 
patch
ReconfigRecoveryTest#testCurrentObserverIsParticipantInNewConfig failed without 
patch.

 Prevent resource leak in various classes
 

 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Critical
 Attachments: 2064-v1.txt, 2064-v2.txt


 In various classes, there is potential resource leak.
 e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
 method.
 Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2014-11-12 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2080:
-

 Summary: ReconfigRecoveryTest fails intermittently
 Key: ZOOKEEPER-2080
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
 Project: ZooKeeper
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I got the following test failure on MacBook with trunk code:
{code}
Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
  FAILED
waiting for server 2 being up
junit.framework.AssertionFailedError: waiting for server 2 being up
  at 
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
  at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2064) Prevent resource leak in various classes

2014-11-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220142#comment-14220142
 ] 

Ted Yu commented on ZOOKEEPER-2064:
---

Thanks Flavio.

 Prevent resource leak in various classes
 

 Key: ZOOKEEPER-2064
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2064
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: 2064-v1.txt, 2064-v2.txt, ZOOKEEPER-2064.patch


 In various classes, there is potential resource leak.
 e.g. LogIterator / RandomAccessFileReader is not closed upon return from the 
 method.
 Corresponding close() should be called to prevent resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord

2015-01-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2105:
--
Attachment: zookeeper-2105-v1.patch

 PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
 --

 Key: ZOOKEEPER-2105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: zookeeper-2105-v1.patch


 {code}
 final PrintWriter pwriter = new PrintWriter(
 new BufferedWriter(new SendBufferWriter()));
 {code}
 pwriter should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord

2015-01-09 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2105:
-

 Summary: PrintWriter left unclosed in 
NIOServerCnxn#checkFourLetterWord
 Key: ZOOKEEPER-2105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
{code}
pwriter should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2105) PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord

2015-01-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272192#comment-14272192
 ] 

Ted Yu commented on ZOOKEEPER-2105:
---

NettyServerCnxn#checkFourLetterWord() has similar issue.

 PrintWriter left unclosed in NIOServerCnxn#checkFourLetterWord
 --

 Key: ZOOKEEPER-2105
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2105
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 {code}
 final PrintWriter pwriter = new PrintWriter(
 new BufferedWriter(new SendBufferWriter()));
 {code}
 pwriter should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2015-03-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved ZOOKEEPER-2080.
---
Resolution: Cannot Reproduce

 ReconfigRecoveryTest fails intermittently
 -

 Key: ZOOKEEPER-2080
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Ted Yu
Priority: Minor

 I got the following test failure on MacBook with trunk code:
 {code}
 Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
   FAILED
 waiting for server 2 being up
 junit.framework.AssertionFailedError: waiting for server 2 being up
   at 
 org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
   at 
 org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2015-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369593#comment-14369593
 ] 

Ted Yu commented on ZOOKEEPER-2080:
---

Looks like the test doesn't fail recently.

 ReconfigRecoveryTest fails intermittently
 -

 Key: ZOOKEEPER-2080
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
 Project: ZooKeeper
  Issue Type: Sub-task
Reporter: Ted Yu
Priority: Minor

 I got the following test failure on MacBook with trunk code:
 {code}
 Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
   FAILED
 waiting for server 2 being up
 junit.framework.AssertionFailedError: waiting for server 2 being up
   at 
 org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
   at 
 org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2170) Zookeeper is not logging as per the configuration in log4j.properties

2015-09-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935526#comment-14935526
 ] 

Ted Yu commented on ZOOKEEPER-2170:
---

If I am not mistaken, 3.4.6 has this issue as well.

When can I expect this to be fixed ?

Thanks

> Zookeeper is not logging as per the configuration in log4j.properties
> -
>
> Key: ZOOKEEPER-2170
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2170
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Arshad Mohammad
>Assignee: Arshad Mohammad
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2170-002.patch, ZOOKEEPER-2170-003.patch, 
> ZOOKEEPER-2170.001.patch
>
>
> In conf/log4j.properties default root logger is 
> {code}
> zookeeper.root.logger=INFO, CONSOLE
> {code}
> Changing root logger to bellow value or any other value does not change 
> logging effect
> {code}
> zookeeper.root.logger=DEBUG, ROLLINGFILE
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2347:
--
Attachment: testSplitLogManager.stack

Stack trace showing the issue

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Priority: Critical
> Attachments: testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-16 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2347:
-

 Summary: Deadlock shutting down zookeeper
 Key: ZOOKEEPER-2347
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.7
Reporter: Ted Yu
Priority: Critical


HBase recently upgraded to zookeeper 3.4.7

In one of the tests, TestSplitLogManager, there is reproducible hang at the end 
of the test.
Below is snippet from stack trace related to zookeeper:
{code}
"main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
condition [0x00011834b000]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0007c5b8d3a0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
  at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)

"main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
nid=0x9513 waiting on condition [0x000118042000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
  at java.lang.Thread.sleep(Native Method)
  at 
org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
  at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)

"SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
entry [0x0001170ac000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at 
org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
  - waiting to lock <0x0007c5b62128> (a 
org.apache.zookeeper.server.ZooKeeperServer)
  at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
  at 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
  at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)

"main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
condition [0x000117a3]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0007c9b106b8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
  at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)

"main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
[0x000108aa1000]
   java.lang.Thread.State: WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  - waiting on <0x0007c5b66400> (a 
org.apache.zookeeper.server.SyncRequestProcessor)
  at java.lang.Thread.join(Thread.java:1281)
  - locked <0x0007c5b66400> (a 
org.apache.zookeeper.server.SyncRequestProcessor)
  at java.lang.Thread.join(Thread.java:1355)
  at 
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
  at 
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
  at 
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
  - locked <0x0007c5b62128> (a org.apache.zookeeper.server.ZooKeeperServer)
  at 
org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
  at 
org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
{code}
Note the address (0x0007c5b66400) in the last hunk which seems to indicate 
some form of deadlock.

According to Camille Fournier:

We made shutdown synchronized. But decrementing the requests is
also synchronized and called from a different thread. So yeah, deadlock.
This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062994#comment-15062994
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Not sure how I can test this with hbase unit test(s).

As far as I know, zookeeper still uses ant to build while hbase dependency is 
expressed through maven.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by 

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063086#comment-15063086
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Thanks for the pointer, Chris.

I ran TestSplitLogManager after modifying pom.xml twice which passed. 
Previously the test hung quite reliably on Mac.



> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by 

[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2016-01-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081602#comment-15081602
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

[~fpj]:
Can you review the patch ?

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2015-12-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075556#comment-15075556
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Rakesh:
Thanks for updating the test case.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with ZOOKEEPER-1907



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089636#comment-15089636
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

[~fpj]:
Can you take a look ?

Thanks

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089595#comment-15089595
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

We encountered this issue during testing, though intermittently.

Can the fix be committed ?
[~shralex] [~phunt]

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2347) Deadlock shutting down zookeeper

2016-01-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095073#comment-15095073
 ] 

Ted Yu commented on ZOOKEEPER-2347:
---

Assuming there was only test change since I performed validation last year, 
this should be good to go.

> Deadlock shutting down zookeeper
> 
>
> Key: ZOOKEEPER-2347
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.7
>Reporter: Ted Yu
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: 3.4.8
>
> Attachments: ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, ZOOKEEPER-2347-br-3.4.patch, 
> ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack
>
>
> HBase recently upgraded to zookeeper 3.4.7
> In one of the tests, TestSplitLogManager, there is reproducible hang at the 
> end of the test.
> Below is snippet from stack trace related to zookeeper:
> {code}
> "main-EventThread" daemon prio=5 tid=0x7fd27488a800 nid=0x6f1f waiting on 
> condition [0x00011834b000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c5b8d3a0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main-SendThread(localhost:59510)" daemon prio=5 tid=0x7fd274eb4000 
> nid=0x9513 waiting on condition [0x000118042000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
>   at 
> org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
> "SyncThread:0" prio=5 tid=0x7fd274d02000 nid=0x730f waiting for monitor 
> entry [0x0001170ac000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
>   - waiting to lock <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
> "main-EventThread" daemon prio=5 tid=0x7fd2753a3800 nid=0x711b waiting on 
> condition [0x000117a3]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007c9b106b8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> "main" prio=5 tid=0x7fd27600 nid=0x1903 in Object.wait() 
> [0x000108aa1000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1281)
>   - locked <0x0007c5b66400> (a 
> org.apache.zookeeper.server.SyncRequestProcessor)
>   at java.lang.Thread.join(Thread.java:1355)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
>   at 
> org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
>   at 
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
>   - locked <0x0007c5b62128> (a 
> org.apache.zookeeper.server.ZooKeeperServer)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
>   at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
> {code}
> Note the address (0x0007c5b66400) in the last hunk which seems to 
> indicate some form of deadlock.
> According to Camille Fournier:
> We made shutdown synchronized. But decrementing the requests is
> also synchronized and called from a different thread. So yeah, deadlock.
> This came in with 

[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v3.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102276#comment-15102276
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Patch v3 addresses comments from Chris and Rakesh.

The same patch can be applied smoothly on branch-3.4

Let me know if separate patch for branch-3.4 should be attached.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v2.patch

Alternate patch for consideration.

Only throw exception if dataDir doesn't exist and mkdirs() call fails.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123934#comment-15123934
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Haven't got a chance to reproduce the bug.

After some QE fix, hbase un-secure deployment works reliably.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v4.patch

Patch v4 addresses Chris' comment above

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v4.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-29 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: (was: ZOOKEEPER-1936.v4.patch)

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108068#comment-15108068
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Previous patch was generated for branch-3.4

Attached patch for trunk.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v3.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-01-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108940#comment-15108940
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3010//testReport/org.apache.zookeeper.test/AsyncHammerTest/testHammer/

doesn't seem to be related to the patch.

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch, 
> ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-03-09 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2384:
-

 Summary: Support atomic increment / decrement of znode value
 Key: ZOOKEEPER-2384
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Ted Yu


Use case is to store reference count (integer type) in znode.

It is desirable to provide support for atomic increment / decrement of the 
znode value.

Suggestion from Flavio:

you can read the znode, keep the version of the znode, update the value, write 
back conditionally. The condition for the setData operation to succeed is that 
the version is the same that it read

While the above is feasible, developer has to implement retry logic 
him/herself. It is not easy to combine increment / decrement with other 
operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-03-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2384:
--
Labels: atomic  (was: )

> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-05-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2384:
--
Description: 
Use case is to store reference count (integer type) in znode.

It is desirable to provide support for atomic increment / decrement of the 
znode value.

Suggestion from Flavio:
{quote}
you can read the znode, keep the version of the znode, update the value, write 
back conditionally. The condition for the setData operation to succeed is that 
the version is the same that it read
{quote}
While the above is feasible, developer has to implement retry logic 
him/herself. It is not easy to combine increment / decrement with other 
operations using multi.

  was:
Use case is to store reference count (integer type) in znode.

It is desirable to provide support for atomic increment / decrement of the 
znode value.

Suggestion from Flavio:

you can read the znode, keep the version of the znode, update the value, write 
back conditionally. The condition for the setData operation to succeed is that 
the version is the same that it read

While the above is feasible, developer has to implement retry logic 
him/herself. It is not easy to combine increment / decrement with other 
operations using multi.


> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> {quote}
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> {quote}
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-10-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550505#comment-15550505
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Is there anything I can do to move this forward ?

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2606:
--
Priority: Minor  (was: Major)

> SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
> 
>
> Key: ZOOKEEPER-2606
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>
> {code}
> LOG.info("Setting authorizedID: " + userNameBuilder);
> ac.setAuthorizedID(userNameBuilder.toString());
> } catch (IOException e) {
> LOG.error("Failed to set name based on Kerberos authentication 
> rules.");
> }
> {code}
> On one cluster, we saw the following:
> {code}
> 2016-10-04 02:18:16,484 - ERROR 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
> Failed to set name based on Kerberos authentication rules.
> {code}
> It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-04 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2606:
-

 Summary: SaslServerCallbackHandler#handleAuthorizeCallback() 
should log the exception
 Key: ZOOKEEPER-2606
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


{code}
LOG.info("Setting authorizedID: " + userNameBuilder);
ac.setAuthorizedID(userNameBuilder.toString());
} catch (IOException e) {
LOG.error("Failed to set name based on Kerberos authentication 
rules.");
}
{code}
On one cluster, we saw the following:
{code}
2016-10-04 02:18:16,484 - ERROR 
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
Failed to set name based on Kerberos authentication rules.
{code}
It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2606:
--
Attachment: ZOOKEEPER-2606.v1.patch

> SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
> 
>
> Key: ZOOKEEPER-2606
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: ZOOKEEPER-2606.v1.patch
>
>
> {code}
> LOG.info("Setting authorizedID: " + userNameBuilder);
> ac.setAuthorizedID(userNameBuilder.toString());
> } catch (IOException e) {
> LOG.error("Failed to set name based on Kerberos authentication 
> rules.");
> }
> {code}
> On one cluster, we saw the following:
> {code}
> 2016-10-04 02:18:16,484 - ERROR 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
> Failed to set name based on Kerberos authentication rules.
> {code}
> It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-09-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.v5.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2016-09-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1936:
--
Attachment: ZOOKEEPER-1936.branch-3.4.patch

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2080) ReconfigRecoveryTest fails intermittently

2016-10-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613237#comment-15613237
 ] 

Ted Yu commented on ZOOKEEPER-2080:
---

Thanks for the effort, Michael.

> ReconfigRecoveryTest fails intermittently
> -
>
> Key: ZOOKEEPER-2080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2080
> Project: ZooKeeper
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Michael Han
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, ZOOKEEPER-2080.patch, 
> ZOOKEEPER-2080.patch, jacoco-ZOOKEEPER-2080.unzip-grows-to-70MB.7z, 
> repro-20150816.log, threaddump.log
>
>
> I got the following test failure on MacBook with trunk code:
> {code}
> Testcase: testCurrentObserverIsParticipantInNewConfig took 93.628 sec
>   FAILED
> waiting for server 2 being up
> junit.framework.AssertionFailedError: waiting for server 2 being up
>   at 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentObserverIsParticipantInNewConfig(ReconfigRecoveryTest.java:529)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2384) Support atomic increment / decrement of znode value

2016-11-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638611#comment-15638611
 ] 

Ted Yu commented on ZOOKEEPER-2384:
---

Thanks for the suggestion, Nick.

> Support atomic increment / decrement of znode value
> ---
>
> Key: ZOOKEEPER-2384
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2384
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Ted Yu
>  Labels: atomic
>
> Use case is to store reference count (integer type) in znode.
> It is desirable to provide support for atomic increment / decrement of the 
> znode value.
> Suggestion from Flavio:
> {quote}
> you can read the znode, keep the version of the znode, update the value, 
> write back conditionally. The condition for the setData operation to succeed 
> is that the version is the same that it read
> {quote}
> While the above is feasible, developer has to implement retry logic 
> him/herself. It is not easy to combine increment / decrement with other 
> operations using multi.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2606) SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception

2016-10-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2606:
--
Labels: security  (was: )

> SaslServerCallbackHandler#handleAuthorizeCallback() should log the exception
> 
>
> Key: ZOOKEEPER-2606
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2606
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>  Labels: security
> Attachments: ZOOKEEPER-2606.v1.patch
>
>
> {code}
> LOG.info("Setting authorizedID: " + userNameBuilder);
> ac.setAuthorizedID(userNameBuilder.toString());
> } catch (IOException e) {
> LOG.error("Failed to set name based on Kerberos authentication 
> rules.");
> }
> {code}
> On one cluster, we saw the following:
> {code}
> 2016-10-04 02:18:16,484 - ERROR 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@137] - 
> Failed to set name based on Kerberos authentication rules.
> {code}
> It would be helpful if the log contains information about the IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822028#comment-15822028
 ] 

Ted Yu commented on ZOOKEEPER-2664:
---

https://github.com/apache/zookeeper/pull/149

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822029#comment-15822029
 ] 

Ted Yu commented on ZOOKEEPER-2664:
---

[~praste]:
Looks like you mistakenly entered ZOOKEEPER-2664 which is not for log4j.

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2664:
--
Comment: was deleted

(was: https://github.com/apache/zookeeper/pull/149)

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821881#comment-15821881
 ] 

Ted Yu commented on ZOOKEEPER-2664:
---

Since ZOOKEEPER-2395 didn't propose patch, I think we can proceed with patch 
review here.

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-2664:
--
Attachment: ZOOKEEPER-2664.v1.txt

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)
Ted Yu created ZOOKEEPER-2664:
-

 Summary: ClientPortBindTest#testBindByAddress may fail due to "No 
such device" exception
 Key: ZOOKEEPER-2664
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
 Project: ZooKeeper
  Issue Type: Test
Affects Versions: 3.4.6
Reporter: Ted Yu


Saw the following in a recent run:
{code}
Stacktrace

java.net.SocketException: No such device
at java.net.NetworkInterface.isLoopback0(Native Method)
at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
at 
org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Standard Output

2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
testBindByAddress
2017-01-12 23:20:43,795 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
testBindByAddress
2017-01-12 23:20:43,799 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
testBindByAddress
java.net.SocketException: No such device
at java.net.NetworkInterface.isLoopback0(Native Method)
at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
at 
org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
{code}
Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ZOOKEEPER-2664) ClientPortBindTest#testBindByAddress may fail due to "No such device" exception

2017-01-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned ZOOKEEPER-2664:
-

Assignee: Ted Yu

> ClientPortBindTest#testBindByAddress may fail due to "No such device" 
> exception
> ---
>
> Key: ZOOKEEPER-2664
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2664
> Project: ZooKeeper
>  Issue Type: Test
>Affects Versions: 3.4.6
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: ZOOKEEPER-2664.v1.txt
>
>
> Saw the following in a recent run:
> {code}
> Stacktrace
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> Standard Output
> 2017-01-12 23:20:43,792 [myid:] - INFO  [main:ZKTestCase$1@50] - STARTING 
> testBindByAddress
> 2017-01-12 23:20:43,795 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@50] - RUNNING TEST METHOD 
> testBindByAddress
> 2017-01-12 23:20:43,799 [myid:] - INFO  
> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED 
> testBindByAddress
> java.net.SocketException: No such device
>   at java.net.NetworkInterface.isLoopback0(Native Method)
>   at java.net.NetworkInterface.isLoopback(NetworkInterface.java:390)
>   at 
> org.apache.zookeeper.test.ClientPortBindTest.testBindByAddress(ClientPortBindTest.java:61)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
> {code}
> Proposed fix is to catch exception from isLoopback() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-04-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-07-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-11-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2017-10-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2018-02-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2018-02-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-1859) pwriter should be closed in NIOServerCnxn#checkFourLetterWord()

2018-07-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated ZOOKEEPER-1859:
--
Description: 
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}
pwriter should be closed in case of telnetCloseCmd

  was:
{code}
final PrintWriter pwriter = new PrintWriter(
new BufferedWriter(new SendBufferWriter()));
...
} else if (len == telnetCloseCmd) {
cleanupWriterSocket(null);
return true;
}
{code}

pwriter should be closed in case of telnetCloseCmd


> pwriter should be closed in NIOServerCnxn#checkFourLetterWord()
> ---
>
> Key: ZOOKEEPER-1859
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1859
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> final PrintWriter pwriter = new PrintWriter(
> new BufferedWriter(new SendBufferWriter()));
> ...
> } else if (len == telnetCloseCmd) {
> cleanupWriterSocket(null);
> return true;
> }
> {code}
> pwriter should be closed in case of telnetCloseCmd



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race

2018-09-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629504#comment-16629504
 ] 

Ted Yu commented on ZOOKEEPER-1936:
---

Can you outline how you plan to fix ?

thanks

> Server exits when unable to create data directory due to race 
> --
>
> Key: ZOOKEEPER-1936
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.5.0
>Reporter: Harald Musum
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, 
> ZOOKEEPER-1936.v2.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, 
> ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this 
> error in the log:
> [2014-05-27 09:29:48.248] ERROR   : -   
> .org.apache.zookeeper.server.ZooKeeperServerMainUnexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x0201d000 nid=0x1727 runnable
> [0x7f55d7dc7000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
> at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x027df800 nid=0x1715 runnable
> [0x7f55d7ed8000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1310)
> at java.io.File.mkdirs(File.java:1337)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.(FileTxnSnapLog.java:84)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
> at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might 
> happen at the same time as starting the server itself. In FileTxnSnapLog() it 
> will check if the directory exists and create it if not. These two tasks do 
> this at the same time, and mkdir fails and server exits the JVM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)