[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559223#comment-16559223 ] Hudson commented on ZOOKEEPER-2251: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #122 (See [https://builds.apache.org/job/ZooKeeper-trunk/122/]) ZOOKEEPER-2251: Add Client side packet response timeout to avoid (hanm: rev 9f82798415351a20136ceb1640b1781723e51cc1) * (add) src/java/test/org/apache/zookeeper/ClientRequestTimeoutTest.java * (edit) zookeeper-docs/src/documentation/content/xdocs/zookeeperProgrammers.xml * (edit) src/java/main/org/apache/zookeeper/ZooKeeper.java * (edit) src/java/main/org/apache/zookeeper/client/ZKClientConfig.java * (edit) src/java/main/org/apache/zookeeper/ClientCnxn.java * (edit) src/java/main/org/apache/zookeeper/KeeperException.java > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2, 3.4.11 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault, pull-request-available > Fix For: 3.6.0, 3.5.5 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552300#comment-16552300 ] Hadoop QA commented on ZOOKEEPER-2251: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1986//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1986//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1986//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2, 3.4.11 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault, pull-request-available > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550168#comment-16550168 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch against trunk revision 4c12f75a9dd546032eafb2d2840061399c2a6a5e. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3698//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2, 3.4.11 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault, pull-request-available > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > Time Spent: 1h > Remaining Estimate: 0h > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547209#comment-16547209 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch against trunk revision cea251a185435e88f54efc5defb92ec9584fc80f. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3697//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2, 3.4.11 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault, pull-request-available > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > Time Spent: 20m > Remaining Estimate: 0h > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387776#comment-16387776 ] Andor Molnar commented on ZOOKEEPER-2251: - [~awkejiang] I believe that [~hanm]'s concerns have to be addressed first on GitHub. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2, 3.4.11 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault > Fix For: 3.5.4, 3.6.0, 3.4.12 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385516#comment-16385516 ] Ke Jiang commented on ZOOKEEPER-2251: - what's the release plan for this patch? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2, 3.4.11 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault > Fix For: 3.5.4, 3.6.0, 3.4.12 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878608#comment-15878608 ] Mel Martinez commented on ZOOKEEPER-2251: - Update: I'm sorry, but I've been pulled away from this in my tasking. This is still a real concern for us, and I think I'll eventually be able to get back to it, but it won't happen real quick. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Mohammad Arshad >Priority: Critical > Labels: fault > Fix For: 3.5.3, 3.6.0, 3.4.11 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820232#comment-15820232 ] Michael Han commented on ZOOKEEPER-2251: Thanks! > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819610#comment-15819610 ] Mel Martinez commented on ZOOKEEPER-2251: - I'll have to go through Lab protocols for approval to submit a patch. I'll look into it. I'll need to set up to build with Ivy against our internal Nexus as well since I'm behind a firewall.Sigh ... busywork. :D > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817267#comment-15817267 ] Michael Han commented on ZOOKEEPER-2251: The PR needs some update to address review comments before we can commit this. In particular, it would be good to have a feature flag that controls the on / off of client side timeout (for reasons commented in PR), also C client side work. Not sure if [~arshad.mohammad] still on this but if not, maybe [~m.martinez] you can start driving this by contributing a patch? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816184#comment-15816184 ] Mel Martinez commented on ZOOKEEPER-2251: - Bumping this. We've put in lots of mitigations around this, but we consider this still an important issue for us that needs to be resolved. Really hope this can make it into 3.4.10. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751475#comment-15751475 ] Mel Martinez commented on ZOOKEEPER-2251: - Yes, we've been figuring this out. Unfortunately, customer deployment requirements can sometimes limit our control over the environment. We are trying to make things as robust and fault tolerant as possible while still allowing for as much flexibility in how things are deployed. There are obviously limits to that! > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750088#comment-15750088 ] ASF GitHub Bot commented on ZOOKEEPER-2251: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/119#discussion_r92530333 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -1495,10 +1504,21 @@ public ReplyHeader submitRequest(RequestHeader h, Record request, Packet packet = queuePacket(h, r, request, response, null, null, null, null, watchRegistration, watchDeregistration); synchronized (packet) { +long waitStartTime = System.currentTimeMillis(); while (!packet.finished) { -packet.wait(); +packet.wait(requestTimeout); +if (!packet.finished && ((System.currentTimeMillis() --- End diff -- Yes, and actually we have a Time class which we can use here that uses nanoTime internally for exact same reason. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750059#comment-15750059 ] Michael Han commented on ZOOKEEPER-2251: Thanks for the info, [~m.martinez]. On a side note about sharing ZK with other processes: it is generally advised to have ZK deployed on dedicated resource - sometimes you might even want to have dedicated device for storing transaction log files. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749214#comment-15749214 ] Mel Martinez commented on ZOOKEEPER-2251: - Unfortunately I don't have a simple deterministic test. But we had an environment where two of our zookeeper servers were sharing their nodes with another cpu & memory (& thread) hungry process and this seemed to result in a lot of zookeepertimeoutexceptions at the client as well as unfortunately frequent hits on this particular deadlock (which does get caught in jvisualvm thread dumps). We could not run more than a day or two on our sustained ingest tests without hitting this. We've moved the two zookeepers to dedicated nodes (the third already was on it's own node) and things have looked stable so far (on this particular issue), but my management is rightfully concerned that we should have this fix in place. We would be perfectly fine with a parameterized wait time option and a default of the current indefinite wait. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734817#comment-15734817 ] ASF GitHub Bot commented on ZOOKEEPER-2251: --- Github user wuwen5 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/119#discussion_r91682496 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -1495,10 +1504,21 @@ public ReplyHeader submitRequest(RequestHeader h, Record request, Packet packet = queuePacket(h, r, request, response, null, null, null, null, watchRegistration, watchDeregistration); synchronized (packet) { +long waitStartTime = System.currentTimeMillis(); while (!packet.finished) { -packet.wait(); +packet.wait(requestTimeout); +if (!packet.finished && ((System.currentTimeMillis() --- End diff -- System.currentTimeMillis() is dependent on System clock. It probably micro-corrected by an external programme.. I think using System.nanoTime() to measure elapsed time is the correct solution. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734548#comment-15734548 ] Michael Han commented on ZOOKEEPER-2251: Has anyone that watching this issue able to have a deterministic reproduce of the issue? The timeout solution is trivial but it's important to try to figure out root cause of packet loss otherwise we could just fix the surface and then have unexpected bug bite us. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734535#comment-15734535 ] ASF GitHub Bot commented on ZOOKEEPER-2251: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/119#discussion_r9147 --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java --- @@ -1495,10 +1504,21 @@ public ReplyHeader submitRequest(RequestHeader h, Record request, Packet packet = queuePacket(h, r, request, response, null, null, null, null, watchRegistration, watchDeregistration); synchronized (packet) { +long waitStartTime = System.currentTimeMillis(); while (!packet.finished) { -packet.wait(); +packet.wait(requestTimeout); --- End diff -- This unconditionally adds a timeout is a pretty significant semantic change. Whether or not we should wait indefinitely is a separate issue to discuss later, but from a compatibility point of view we'd at least provide a configuration option to let user opt in this feature rather than having this turned on by default. So by default there is no timeout and client still waits indefinitely, user who wants to opt in to turn on the timeout needs to explicit say so and provide a value they think make sense to them. Otherwise, we'd have to solve the problem of what the default timeout value we should set in ZK config if this feature is turned on by default, and that problem is hard and very likely there is no default value that makes everyone happy. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734534#comment-15734534 ] ASF GitHub Bot commented on ZOOKEEPER-2251: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/119#discussion_r91666945 --- Diff: src/java/main/org/apache/zookeeper/KeeperException.java --- @@ -387,6 +389,8 @@ public void setCode(int code) { EPHEMERALONLOCALSESSION (EphemeralOnLocalSession), /** Attempts to remove a non-existing watcher */ NOWATCHER (-121), +/** Not received packet timely */ +TIMEOUT (-122), --- End diff -- Similar error code needs to be added to C client to make both client library compatible. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734533#comment-15734533 ] ASF GitHub Bot commented on ZOOKEEPER-2251: --- Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/119#discussion_r91666999 --- Diff: src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml --- @@ -1538,6 +1538,24 @@ public abstract class ServerAuthenticationProvider implements AuthenticationProv Specifies path to kinit binary. Default is "/usr/bin/kinit". + +zookeeper.request.timeout + + +New in 3.5.3: --- End diff -- This probably would not make to 3.5.3. 3.6.0 would be a more realistic value. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725558#comment-15725558 ] Edward Ribeiro commented on ZOOKEEPER-2251: --- AFAIK, if the JIRA issue states that it fixes 3.4.10, 3.5.3, and 3.6.0 then you should point to the earliest branch (branch-3.4). When applying the PR the committer can port it to branch-3.5 and master. By the way, I advise you to see if the patch applies cleanly on branch-3.4, branch-3.5 and master so you can save committer time (that is scarce) from having to deal with conflicts. ;) _Please_, correct me if I am wrong, [~fpj], [~rgs], [~breed]. :) > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723085#comment-15723085 ] Hadoop QA commented on ZOOKEEPER-2251: -- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/101//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/101//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/101//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723058#comment-15723058 ] Mel Martinez commented on ZOOKEEPER-2251: - Is 'master' the correct integration branch to point that pull request to? I'm not familiar with the ZooKeeper project's branch strategy so I'm just asking. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723037#comment-15723037 ] ASF GitHub Bot commented on ZOOKEEPER-2251: --- GitHub user arshadmohammad opened a pull request: https://github.com/apache/zookeeper/pull/119 ZOOKEEPER-2251:Add Client side packet response timeout to avoid infinite wait. Add Client side packet response timeout to avoid infinite wait. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arshadmohammad/zookeeper ZOOKEEPER-2251 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/119.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #119 commit b4ac9f9f7f3742ac2bd08e383c9650cbf4962691 Author: arshadmohammad Date: 2016-12-05T18:55:10Z ZOOKEEPER-2251:Add Client side packet response timeout to avoid infinite wait. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713743#comment-15713743 ] Mel Martinez commented on ZOOKEEPER-2251: - Michael Han - we are using version 3.4.5. We would almost certainly upgrade to a newer version if we knew this problem would be addressed. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713491#comment-15713491 ] Michael Han commented on ZOOKEEPER-2251: bq. Is the problem simply that the patch needs to be updated to match the latest code? [~m.martinez] Thanks for bumping this up. Community has been working on a couple of high priority issues to prepare incoming 3.4.10 and 3.5.3 releases. This issue was not labelled with version info so it does not get much visibility in the queue. Just updated the JIRA and reviewing the patch. bq. We are definitely running into what looks to be this problem [~m.martinez] Which version of ZooKeeper you are running? On a side note, the patch does look outdated (i.e. the doc changes refer to 3.5.2 which is released already) and needs to be rebased. [~arshad.mohammad] Do you mind update the patch and send a pull request on git? We can start iterating from there. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.4.9, 3.5.2 >Reporter: nijel >Assignee: Arshad Mohammad >Priority: Critical > Labels: fault > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702711#comment-15702711 ] Mel Martinez commented on ZOOKEEPER-2251: - Is the problem simply that the patch needs to be updated to match the latest code? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702576#comment-15702576 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch against trunk revision d72f27279a13986ee0c011e1e5b34edf3a310da9. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3534//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702561#comment-15702561 ] Mel Martinez commented on ZOOKEEPER-2251: - Hi. We are definitely running into what looks to be this problem and I was wondering if there was any movement towards incorporating the fix into a release version? I've worked around the issue in one part of our code by wrapping my own timeout around the call that is vulnerable to the deadlock. However, this isn't possible in some cases where 3rd party code (in our case, Accumulo client & Thrift libraries) call down into the ZooKeeper client code and hit this. This has become a pretty critical issue for us. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495560#comment-15495560 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch against trunk revision b2a484cfe743116d2531fe5d1e1d78b3960c511e. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3434//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495374#comment-15495374 ] Adam Whitney commented on ZOOKEEPER-2251: - I think I'm seeing this problem using activemq with replicated leveldb ... every once in a while our network loses some packets when zookeeper server is acking back to the zookeeper client in activemq, and intermittently it seems to lead to a hang on the activemq side. Unfortunately, I don't have the ability yet to get a thread dump, so I can't tell for sure, but the cause and symptoms of my issue match very well with this bug. It looks like this bug has passed all the reviews ... is there any reason why it is not yet merged? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182742#comment-15182742 ] Hadoop QA commented on ZOOKEEPER-2251: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch against trunk revision 1733679. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3091//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3091//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3091//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063565#comment-15063565 ] Akihiro Suda commented on ZOOKEEPER-2251: - Hi [~arshad.mohammad] and [~nijel], Thank you for comments, and sorry for that I'm still not able to identify the root cause of the bug. I'm ok for having this timeout. I would like to respect committers' decision. Cc: [~rgs] (zktraffic author), could you please look on this? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063517#comment-15063517 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12765803/ZOOKEEPER-2251-03.patch against trunk revision 1720227. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2999//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063503#comment-15063503 ] nijel commented on ZOOKEEPER-2251: -- hi [~marshad] and [~suda] I observed this when i am doing reliability test for a banking customer Here we test for any network abnormality and packet drop. Here in the scenario packet is sent and wait for ever. Even if the server is not responding due to any reason, this issue can happen so my opinion is to have this time out since many services' high availability solution depends on zookeeper. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056085#comment-15056085 ] Arshad Mohammad commented on ZOOKEEPER-2251: I think irrespective of the reason, why packet(server response) not received, there should be some time-out. Can not keep waiting infinitely {code} synchronized (packet) { while (!packet.finished) { packet.wait(); } } {code} Seems you have good system understanding. please share if you find the reason. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055604#comment-15055604 ] Akihiro Suda commented on ZOOKEEPER-2251: - Hi [~arshad.mohammad], thank you for the response. The attached patch assumes that a response packet can be "lost". But I'm still not sure what "lost" means. https://issues.apache.org/jira/browse/ZOOKEEPER-2251?focusedCommentId=14706468&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14706468 If TCP packets are actually lost, Linux kernel (on the server) should retransmit packets. If even retransmitted packets are lost, client kernel should raise {{ETIMEDOUT}} after 924.6 seconds by default. (It depends on the value of {{/proc/sys/net/ipv4/tcp_retries2}}.) So I think infinite wait should not happen even if packets are lost. I can come up with following possibilities: 1. Client kernel raised {{ETIMEDOUT}}, and JVM raised a corresponding {{java.lang.IOException}}. But client ZK did not catch the {{IOException}} correctly. It's a bug of client. 2. Server sent a TCP FIN packet. Client received the TCP FIN packet and raised {{IOException}}, but client ZK did not catch the {{IOException}} correctly. It's a bug of client. 3. No TCP FIN packet were sent. It's a bug of server. 4. Bug of kernel or JVM. I think we should dig out what really happens when we hit the bug, before merging the patch and closing the JIRA. (No matter what is the real cause of the bug, the patch is useful! Because non-privileged users cannot modify {{/proc/sys/net/ipv4/*}}.) I would like to hear your comments. I'm also trying to reproduce the bug again, and digging out what are really happening. Thanks! > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055549#comment-15055549 ] Arshad Mohammad commented on ZOOKEEPER-2251: Thanks [~suda] for sharing that you experienced this problem. Regarding the status, implementation was done long back, review and commit is pending. Meanwhile many changes have happened so the current patch need to be rebased. I will soon rebase and submit new patch > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054528#comment-15054528 ] Akihiro Suda commented on ZOOKEEPER-2251: - I recently experienced this bug in my cluster. Any update? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951011#comment-14951011 ] Edward Ribeiro commented on ZOOKEEPER-2251: --- Oh, I see now. :) Just a small review comment: give preference to use StringBuilder instead of StringBuffer, specially if the use is as a method variable, because StringBuffer methods are synchronized while StringBuilder methods are not. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950697#comment-14950697 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12765803/ZOOKEEPER-2251-03.patch against trunk revision 1706631. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2908//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, > ZOOKEEPER-2251-03.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950237#comment-14950237 ] Arshad Mohammad commented on ZOOKEEPER-2251: Thanks [~eribeiro]for looking into the patch. Quorum servers are required because this patch also hits the logic of reconnect. now connecting to all the servers. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948758#comment-14948758 ] Edward Ribeiro commented on ZOOKEEPER-2251: --- [~arshad.mohammad], the patch looks good, congrats. One question: why {{ZKClientHangTest}} creates 3 servers if the core of the test needs just one ({{clientPorts[0]}}) that the client connects to? More servers are more time to wait for the unit test to run, slowing things down. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948362#comment-14948362 ] nijel commented on ZOOKEEPER-2251: -- One more comment can you add this config in document as well ? > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948352#comment-14948352 ] nijel commented on ZOOKEEPER-2251: -- thanks [~arshad.mohammad] for the patch overall looks good few comments 1. The log message {code} LOG.info("{} configured value is {}.", ZOOKEEPER_REQUEST_TIMEOUT, 148 requestTimeoutProp); {code} can be written after the value is read. Or in invalid case duplicate logs will come 2. Change test class and test method name to reflect timeout feature 3. Number format exception logs looks like not a complete sentence. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948082#comment-14948082 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12765520/ZOOKEEPER-2251-02.patch against trunk revision 1706631. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2906//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2906//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2906//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946889#comment-14946889 ] Hadoop QA commented on ZOOKEEPER-2251: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12765389/ZOOKEEPER-2251-01.patch against trunk revision 1706631. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2905//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2905//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2905//console This message is automatically generated. > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > Attachments: ZOOKEEPER-2251-01.patch > > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946850#comment-14946850 ] Arshad Mohammad commented on ZOOKEEPER-2251: How to reproduce the: >From the logs it is very clear that Client is waiting in >{{org.apache.zookeeper.ClientCnxn.submitRequest()}} on Packet object. {code} synchronized (packet) { while (!packet.finished) { packet.wait(); } } {code} Above wait ends when packet.notifyAll() is called from {{org.apache.zookeeper.ClientCnxn.finishPacket(Packet)}} To reproduce client hang scenario through test we can override the {{org.apache.zookeeper.ClientCnxn.finishPacket(Packet)}} method, so that it does not call packet.notifyAll(). > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730278#comment-14730278 ] Akihiro Suda commented on ZOOKEEPER-2251: - Hi [~nijel], thank you for the information. The [{{wait()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515] might be waiting for this [{{notifyAll()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L736]. A typical call stack for the notifyAll() is as follows: - {{Object.notifyAll()}} - [{{ClientCnxn.finishPacket()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L736] - [{{ClientCnxn$SendThread.readResponse()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L948] - [{{ClientCnxnSocketNIO.doIO()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java#L99] - [{{ClientCnxnSocketNIO.doTransport()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java#L361] - [{{ClientCnxn$SendThread.run()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1236] If you can still reproduce the bug, could you please add some logs for the above methods and share the result (plus ZooKeeper version and YARN version)? Thanks > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel >Assignee: Arshad Mohammad > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728996#comment-14728996 ] nijel commented on ZOOKEEPER-2251: -- hi [~suda] i got this issue from one of our site. The issue is from ResourceManager. I could get very few info. The thread which got blocked is as follows {code} StandByTransitionThread Handler" daemon prio=10 tid=0x0119a800 nid=0x641e3 in Object.wait() [0x7ff3f67ab000] 1836java.lang.Thread.State: WAITING (on object monitor) 1837 at java.lang.Object.wait(Native Method) 1838 - waiting on <0x0007855f2010> (a org.apache.zookeeper.ClientCnxn$Packet) 1839 at java.lang.Object.wait(Object.java:503) 1840 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1414) 1841 - locked <0x0007855f2010> (a org.apache.zookeeper.ClientCnxn$Packet) 1842 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1386) 1843 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677) 1844 - locked <0x00078660daf0> (a org.apache.zookeeper.ZooKeeper) 1845 at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.closeZkClients(ZKRMStateStore.java:360) 1846 - locked <0x000781087940> (a org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore) 1847 at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.closeInternal(ZKRMStateStore.java:382) 1848 - locked <0x000781087940> (a org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore) 1849 at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:486) 1850 at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) 1851 - locked <0x0007823b2a80> (a java.lang.Object) 1852 at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250) 1853 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:549) 1854 at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) 1855 - locked <0x0007823d0610> (a java.lang.Object) 1856 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:958) 1857 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1018) 1858 - locked <0x000780059c60> (a org.apache.hadoop.yarn.server.resourcemanager.ResourceManager) 1859 at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.handleTransitionToStandBy(ResourceManager.java:703) 1860 at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StandByTransitionThread.run(RMStateStore.java:884) {code} My analysis is that it is waiting for the packet response Hope this helps > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706468#comment-14706468 ] Akihiro Suda commented on ZOOKEEPER-2251: - Hello, I'm trying to reproduce this bug. However I'm not sure about the meaning of "only few packets missed". If some IP packets were lost, TCP layer (in the OS kernel) should be able to detect and retransmit lost packets. So I guess the lost packets are eventually delivered to the client OS, but ZK client get hangs due to unexpected delay of the delivered packets. Could you please share your client workload and its thread dump? BTW, for partial packet loss simulation, [Linux netem|http://www.linuxfoundation.org/collaborate/workgroups/networking/netem#Packet_loss] or my [Earthquake tool|http://osrg.github.io/earthquake/post/zookeeper-2212/](found ZOOKEEPER-2212, and perhaps also reproduced ZOOKEEPER-2080) might be useful. (Wireshark is a packet analyzer, and doesn't provide any network simulation feature.) > Add Client side packet response timeout to avoid infinite wait. > --- > > Key: ZOOKEEPER-2251 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251 > Project: ZooKeeper > Issue Type: Bug >Reporter: nijel > > I came across one issue related to Client side packet response timeout In my > cluster many packet drops happened for some time. > One observation is the zookeeper client got hanged. As per the thread dump it > is waiting for the response/ACK for the operation performed (synchronous API > used here). > I am using > zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory > Since only few packets missed there is no DISCONNECTED event occurred. > Need add a "response time out" for the operations or packets. > *Comments from [~rakeshr]* > My observation about the problem:- > * Can use tools like 'Wireshark' to simulate the artificial packet loss. > * Assume there is only one packet in the 'outgoingQueue' and unfortunately > the server response packet lost. Now, client will enter into infinite > waiting. > https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515 > * Probably we can discuss more about this problem and possible solutions(add > packet ACK timeout or another better approach) in the jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)