[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2018-07-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559223#comment-16559223
 ] 

Hudson commented on ZOOKEEPER-2251:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #122 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/122/])
ZOOKEEPER-2251: Add Client side packet response timeout to avoid (hanm: rev 
9f82798415351a20136ceb1640b1781723e51cc1)
* (add) src/java/test/org/apache/zookeeper/ClientRequestTimeoutTest.java
* (edit) zookeeper-docs/src/documentation/content/xdocs/zookeeperProgrammers.xml
* (edit) src/java/main/org/apache/zookeeper/ZooKeeper.java
* (edit) src/java/main/org/apache/zookeeper/client/ZKClientConfig.java
* (edit) src/java/main/org/apache/zookeeper/ClientCnxn.java
* (edit) src/java/main/org/apache/zookeeper/KeeperException.java


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2, 3.4.11
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault, pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2018-07-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552300#comment-16552300
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1986//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1986//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1986//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2, 3.4.11
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault, pull-request-available
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2018-07-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550168#comment-16550168
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch
  against trunk revision 4c12f75a9dd546032eafb2d2840061399c2a6a5e.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3698//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2, 3.4.11
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault, pull-request-available
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2018-07-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547209#comment-16547209
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch
  against trunk revision cea251a185435e88f54efc5defb92ec9584fc80f.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3697//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2, 3.4.11
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault, pull-request-available
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2018-03-06 Thread Andor Molnar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387776#comment-16387776
 ] 

Andor Molnar commented on ZOOKEEPER-2251:
-

[~awkejiang]

I believe that [~hanm]'s concerns have to be addressed first on GitHub.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2, 3.4.11
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2018-03-04 Thread Ke Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385516#comment-16385516
 ] 

Ke Jiang commented on ZOOKEEPER-2251:
-

what's the release plan for this patch?

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2, 3.4.11
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2017-02-22 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878608#comment-15878608
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Update:  I'm sorry, but I've been pulled away from this in my tasking.   This 
is still a real concern for us, and I think I'll eventually be able to get back 
to it, but it won't happen real quick.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Mohammad Arshad
>Priority: Critical
>  Labels: fault
> Fix For: 3.5.3, 3.6.0, 3.4.11
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2017-01-11 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15820232#comment-15820232
 ] 

Michael Han commented on ZOOKEEPER-2251:


Thanks!

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2017-01-11 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819610#comment-15819610
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

I'll have to go through Lab protocols for approval to submit a patch.  I'll 
look into it.  

I'll need to set up to build with Ivy against our internal Nexus as well since 
I'm behind a firewall.Sigh ... busywork.  :D

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2017-01-10 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817267#comment-15817267
 ] 

Michael Han commented on ZOOKEEPER-2251:


The PR needs some update to address review comments before we can commit this. 
In particular, it would be good to have a feature flag that controls the on / 
off of client side timeout (for reasons commented in PR), also C client side 
work.

Not sure if [~arshad.mohammad] still on this but if not, maybe [~m.martinez] 
you can start driving this by contributing a patch? 

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2017-01-10 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816184#comment-15816184
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Bumping this.   We've put in lots of mitigations around this, but we consider 
this still an important issue for us that needs to be resolved.   Really hope 
this can make it into 3.4.10.



> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-15 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751475#comment-15751475
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Yes, we've been figuring this out.

Unfortunately, customer deployment requirements can sometimes limit our control 
over the environment.  We are trying to make things as robust and fault 
tolerant as possible while still allowing for as much flexibility in how things 
are deployed.  There are obviously limits to that!


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750088#comment-15750088
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2251:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/119#discussion_r92530333
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -1495,10 +1504,21 @@ public ReplyHeader submitRequest(RequestHeader h, 
Record request,
 Packet packet = queuePacket(h, r, request, response, null, null, 
null,
 null, watchRegistration, watchDeregistration);
 synchronized (packet) {
+long waitStartTime = System.currentTimeMillis();
 while (!packet.finished) {
-packet.wait();
+packet.wait(requestTimeout);
+if (!packet.finished && ((System.currentTimeMillis()
--- End diff --

Yes, and actually we have a Time class which we can use here that uses 
nanoTime internally for exact same reason. 


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-14 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750059#comment-15750059
 ] 

Michael Han commented on ZOOKEEPER-2251:


Thanks for the info, [~m.martinez]. 

On a side note about sharing ZK with other processes: it is generally advised 
to have ZK deployed on dedicated resource - sometimes you might even want to 
have dedicated device for storing transaction log files. 

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-14 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749214#comment-15749214
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Unfortunately I don't have a simple deterministic test.  But we had an 
environment where two of our zookeeper servers were sharing their nodes with 
another cpu & memory (& thread) hungry process and this seemed to result in a 
lot of zookeepertimeoutexceptions at the client as well as unfortunately 
frequent hits on this particular deadlock (which does get caught in jvisualvm 
thread dumps).   We could not run more than a day or two on our sustained 
ingest tests without hitting this.

We've moved the two zookeepers to dedicated nodes  (the third already was on 
it's own node) and things have looked stable so far (on this particular issue), 
but my management is rightfully concerned that we should have this fix in place.

We would be perfectly fine with a parameterized wait time option and a default 
of the current indefinite wait.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734817#comment-15734817
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2251:
---

Github user wuwen5 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/119#discussion_r91682496
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -1495,10 +1504,21 @@ public ReplyHeader submitRequest(RequestHeader h, 
Record request,
 Packet packet = queuePacket(h, r, request, response, null, null, 
null,
 null, watchRegistration, watchDeregistration);
 synchronized (packet) {
+long waitStartTime = System.currentTimeMillis();
 while (!packet.finished) {
-packet.wait();
+packet.wait(requestTimeout);
+if (!packet.finished && ((System.currentTimeMillis()
--- End diff --

System.currentTimeMillis() is dependent on System clock. It probably 
micro-corrected by an external programme..

I think using System.nanoTime() to measure elapsed time is the correct 
solution.


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-08 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734548#comment-15734548
 ] 

Michael Han commented on ZOOKEEPER-2251:


Has anyone that watching this issue able to have a deterministic reproduce of 
the issue? The timeout solution is trivial but it's important to try to figure 
out root cause of packet loss otherwise we could just fix the surface and then 
have unexpected bug bite us. 

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734535#comment-15734535
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2251:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/119#discussion_r9147
  
--- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
@@ -1495,10 +1504,21 @@ public ReplyHeader submitRequest(RequestHeader h, 
Record request,
 Packet packet = queuePacket(h, r, request, response, null, null, 
null,
 null, watchRegistration, watchDeregistration);
 synchronized (packet) {
+long waitStartTime = System.currentTimeMillis();
 while (!packet.finished) {
-packet.wait();
+packet.wait(requestTimeout);
--- End diff --

This unconditionally adds a timeout is a pretty significant semantic 
change. Whether or not we should wait indefinitely is a separate issue to 
discuss later, but from a compatibility point of view we'd at least provide a 
configuration option to let user opt in this feature rather than having this 
turned on by default. So by default there is no timeout and client still waits 
indefinitely, user who wants to opt in to turn on the timeout needs to explicit 
say so and provide a value they think make sense to them. Otherwise, we'd have 
to solve the problem of what the default timeout value we should set in ZK 
config if this feature is turned on by default, and that problem is hard and 
very likely there is no default value that makes everyone happy.


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734534#comment-15734534
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2251:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/119#discussion_r91666945
  
--- Diff: src/java/main/org/apache/zookeeper/KeeperException.java ---
@@ -387,6 +389,8 @@ public void setCode(int code) {
 EPHEMERALONLOCALSESSION (EphemeralOnLocalSession),
 /** Attempts to remove a non-existing watcher */
 NOWATCHER (-121),
+/** Not received packet timely */
+TIMEOUT (-122),
--- End diff --

Similar error code needs to be added to C client to make both client 
library compatible.


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734533#comment-15734533
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2251:
---

Github user hanm commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/119#discussion_r91666999
  
--- Diff: src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml 
---
@@ -1538,6 +1538,24 @@ public abstract class ServerAuthenticationProvider 
implements AuthenticationProv
 Specifies path to kinit binary. Default is 
"/usr/bin/kinit".
 
 
+
+zookeeper.request.timeout
+
+
+New in 3.5.3:
--- End diff --

This probably would not make to 3.5.3. 3.6.0 would be a more realistic 
value.


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-06 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725558#comment-15725558
 ] 

Edward Ribeiro commented on ZOOKEEPER-2251:
---

AFAIK, if the JIRA issue states that it fixes 3.4.10, 3.5.3, and 3.6.0 then you 
should point to the earliest branch (branch-3.4). When applying the PR the 
committer can port it to branch-3.5 and master. By the way, I advise you to see 
if the patch applies cleanly on branch-3.4, branch-3.5 and master so you can 
save committer time (that is scarce) from having to deal with conflicts. ;)

_Please_, correct me if I am wrong, [~fpj], [~rgs], [~breed]. :)

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723085#comment-15723085
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/101//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/101//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/101//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-05 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723058#comment-15723058
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Is 'master' the correct integration branch to point that pull request to?  

I'm not familiar with the ZooKeeper project's branch strategy so I'm just 
asking.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723037#comment-15723037
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2251:
---

GitHub user arshadmohammad opened a pull request:

https://github.com/apache/zookeeper/pull/119

ZOOKEEPER-2251:Add Client side packet response timeout to avoid infinite 
wait.

Add Client side packet response timeout to avoid infinite wait.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arshadmohammad/zookeeper ZOOKEEPER-2251

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #119


commit b4ac9f9f7f3742ac2bd08e383c9650cbf4962691
Author: arshadmohammad 
Date:   2016-12-05T18:55:10Z

ZOOKEEPER-2251:Add Client side packet response timeout to avoid infinite 
wait.




> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.4.10, 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-01 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713743#comment-15713743
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Michael Han - we are using version 3.4.5.  

We would almost certainly upgrade to a newer version if we knew this problem 
would be addressed. 



> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-12-01 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713491#comment-15713491
 ] 

Michael Han commented on ZOOKEEPER-2251:


bq. Is the problem simply that the patch needs to be updated to match the 
latest code?

[~m.martinez] Thanks for bumping this up. Community has been working on a 
couple of high priority issues to prepare incoming 3.4.10 and 3.5.3 releases. 
This issue was not labelled with version info so it does not get much 
visibility in the queue. Just updated the JIRA and reviewing the patch.

bq. We are definitely running into what looks to be this problem
[~m.martinez] Which version of ZooKeeper you are running?

On a side note, the patch does look outdated (i.e. the doc changes refer to 
3.5.2 which is released already) and needs to be rebased. [~arshad.mohammad] Do 
you mind update the patch and send a pull request on git? We can start 
iterating from there.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.9, 3.5.2
>Reporter: nijel
>Assignee: Arshad Mohammad
>Priority: Critical
>  Labels: fault
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-11-28 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702711#comment-15702711
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Is the problem simply that the patch needs to be updated to match the latest 
code?

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-11-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702576#comment-15702576
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch
  against trunk revision d72f27279a13986ee0c011e1e5b34edf3a310da9.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3534//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-11-28 Thread Mel Martinez (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702561#comment-15702561
 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-

Hi.  We are definitely running into what looks to be this problem and I was 
wondering if there was any movement towards incorporating the fix into a 
release version?

I've worked around the issue in one part of our code by wrapping my own timeout 
around the call that is vulnerable to the deadlock.

However, this isn't possible in some cases where 3rd party code (in our case, 
Accumulo client & Thrift libraries) call down into the ZooKeeper client code 
and hit this.

This has become a pretty critical issue for us. 

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495560#comment-15495560
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch
  against trunk revision b2a484cfe743116d2531fe5d1e1d78b3960c511e.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3434//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-09-15 Thread Adam Whitney (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495374#comment-15495374
 ] 

Adam Whitney commented on ZOOKEEPER-2251:
-

I think I'm seeing this problem using activemq with replicated leveldb ... 
every once in a while our network loses some packets when zookeeper server is 
acking back to the zookeeper client in activemq, and intermittently it seems to 
lead to a hang on the activemq side. Unfortunately, I don't have the ability 
yet to get a thread dump, so I can't tell for sure, but the cause and symptoms 
of my issue match very well with this bug. It looks like this bug has passed 
all the reviews ... is there any reason why it is not yet merged?

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2016-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182742#comment-15182742
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12791733/ZOOKEEPER-2251-04.patch
  against trunk revision 1733679.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3091//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3091//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3091//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-17 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063565#comment-15063565
 ] 

Akihiro Suda commented on ZOOKEEPER-2251:
-

Hi [~arshad.mohammad] and [~nijel],
Thank you for comments,
and sorry for that I'm still not able to identify the root cause of the bug.

I'm ok for having this timeout.
I would like to respect committers' decision.

Cc: [~rgs] (zktraffic author), could you please look on this?


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063517#comment-15063517
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12765803/ZOOKEEPER-2251-03.patch
  against trunk revision 1720227.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2999//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-17 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063503#comment-15063503
 ] 

nijel commented on ZOOKEEPER-2251:
--

hi [~marshad] and [~suda]

I observed this when i am doing reliability test for a banking customer
Here we test for any network abnormality and packet drop.

Here in the scenario packet is sent and wait for ever. Even if the server is 
not responding due to any reason, this issue can happen

so my opinion is to have this time out since many services' high availability 
solution depends on zookeeper.



> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-14 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056085#comment-15056085
 ] 

Arshad Mohammad commented on ZOOKEEPER-2251:


I think irrespective of the reason, why packet(server response) not received, 
there should be some time-out. Can not keep waiting infinitely
{code}
synchronized (packet) {
while (!packet.finished) {
packet.wait();
}
}
{code}
Seems you have good system understanding. please share if you find the reason.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-14 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055604#comment-15055604
 ] 

Akihiro Suda commented on ZOOKEEPER-2251:
-

Hi [~arshad.mohammad], thank you for the response.

The attached patch assumes that a response packet can be "lost".
But I'm still not sure what "lost" means.
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?focusedCommentId=14706468&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14706468

If TCP packets are actually lost, Linux kernel (on the server) should 
retransmit packets.
If even retransmitted packets are lost, client kernel should raise 
{{ETIMEDOUT}} after 924.6 seconds by default. (It depends on the value of 
{{/proc/sys/net/ipv4/tcp_retries2}}.)

So I think infinite wait should not happen even if packets are lost.

I can come up with following possibilities:

1. Client kernel raised {{ETIMEDOUT}}, and JVM raised a corresponding 
{{java.lang.IOException}}. But client ZK did not catch the {{IOException}} 
correctly. It's a bug of client.

2. Server sent a TCP FIN packet. Client received the TCP FIN packet and raised 
{{IOException}}, but client ZK did not catch the {{IOException}} correctly. 
It's a bug of client.

3. No TCP FIN packet were sent. It's a bug of server.

4. Bug of kernel or JVM.


I think we should dig out what really happens when we hit the bug, before 
merging the patch and closing the JIRA.

(No matter what is the real cause of the bug, the patch is useful! Because 
non-privileged users cannot modify {{/proc/sys/net/ipv4/*}}.)

I would like to hear your comments.

I'm also trying to reproduce the bug again, and digging out what are really 
happening.

Thanks!


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-13 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055549#comment-15055549
 ] 

Arshad Mohammad commented on ZOOKEEPER-2251:


Thanks [~suda] for sharing that you experienced this problem. 
Regarding the status, implementation was done long back, review and commit is 
pending. Meanwhile many changes have happened so the current patch need to be 
rebased. I will soon rebase and submit new patch

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-12-12 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054528#comment-15054528
 ] 

Akihiro Suda commented on ZOOKEEPER-2251:
-

I recently experienced this bug in my cluster.
Any update?

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-09 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951011#comment-14951011
 ] 

Edward Ribeiro commented on ZOOKEEPER-2251:
---

Oh, I see now. :) Just a small review comment: give preference to use 
StringBuilder instead of StringBuffer, specially if the use is as a method 
variable, because StringBuffer methods are synchronized while StringBuilder 
methods are not.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950697#comment-14950697
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12765803/ZOOKEEPER-2251-03.patch
  against trunk revision 1706631.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2908//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-09 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950237#comment-14950237
 ] 

Arshad Mohammad commented on ZOOKEEPER-2251:


Thanks [~eribeiro]for looking into the patch. Quorum servers are required 
because this patch also hits the logic of reconnect.
now connecting to all the servers.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-08 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948758#comment-14948758
 ] 

Edward Ribeiro commented on ZOOKEEPER-2251:
---

[~arshad.mohammad], the patch looks good, congrats. One question: why 
{{ZKClientHangTest}} creates 3 servers if the core of the test needs just one 
({{clientPorts[0]}}) that the client connects to? More servers are more time to 
wait for the unit test to run, slowing things down.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948362#comment-14948362
 ] 

nijel commented on ZOOKEEPER-2251:
--

One more comment
can you add this config in document as well ?

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948352#comment-14948352
 ] 

nijel commented on ZOOKEEPER-2251:
--

thanks [~arshad.mohammad] for the patch
overall looks good

few comments
1. The log message 
{code}
LOG.info("{} configured value is {}.", ZOOKEEPER_REQUEST_TIMEOUT,
148 requestTimeoutProp);
{code}
can be written after the value is read. Or in invalid case duplicate logs will 
come

2. Change test class and test method name to reflect timeout feature

3. Number format exception logs looks like not a complete sentence.




> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948082#comment-14948082
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12765520/ZOOKEEPER-2251-02.patch
  against trunk revision 1706631.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2906//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2906//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2906//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946889#comment-14946889
 ] 

Hadoop QA commented on ZOOKEEPER-2251:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12765389/ZOOKEEPER-2251-01.patch
  against trunk revision 1706631.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2905//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2905//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2905//console

This message is automatically generated.

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
> Attachments: ZOOKEEPER-2251-01.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-10-07 Thread Arshad Mohammad (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946850#comment-14946850
 ] 

Arshad Mohammad commented on ZOOKEEPER-2251:


How to reproduce the:
>From the logs it is very clear that Client is waiting in 
>{{org.apache.zookeeper.ClientCnxn.submitRequest()}} on Packet object.
{code}
synchronized (packet) {
while (!packet.finished) {
packet.wait();
}
}
{code}
Above wait ends when packet.notifyAll() is called from 
{{org.apache.zookeeper.ClientCnxn.finishPacket(Packet)}}
To reproduce client hang scenario through test we can override the 
{{org.apache.zookeeper.ClientCnxn.finishPacket(Packet)}} method, so that it 
does not call packet.notifyAll(). 

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-09-03 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730278#comment-14730278
 ] 

Akihiro Suda commented on ZOOKEEPER-2251:
-

Hi [~nijel], thank you for the information.

The 
[{{wait()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515]
 might be waiting for this 
[{{notifyAll()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L736].

A typical call stack for the notifyAll() is as follows:
- {{Object.notifyAll()}}
- 
[{{ClientCnxn.finishPacket()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L736]
- 
[{{ClientCnxn$SendThread.readResponse()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L948]
- 
[{{ClientCnxnSocketNIO.doIO()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java#L99]
- 
[{{ClientCnxnSocketNIO.doTransport()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxnSocketNIO.java#L361]
- 
[{{ClientCnxn$SendThread.run()}}|https://github.com/apache/zookeeper/blob/7f10f9c53a48b296dd27c7a104b303b10b045a89/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1236]

If you can still reproduce the bug, could you please add some logs for the 
above methods and share the result (plus ZooKeeper version and YARN version)?

Thanks


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>Assignee: Arshad Mohammad
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-09-03 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728996#comment-14728996
 ] 

nijel commented on ZOOKEEPER-2251:
--

hi [~suda]
i got this issue from one of our site. The issue is from ResourceManager.
I could get very few info. The thread which got blocked is as follows

{code}
StandByTransitionThread Handler" daemon prio=10 tid=0x0119a800 
nid=0x641e3 in Object.wait() [0x7ff3f67ab000]
1836java.lang.Thread.State: WAITING (on object monitor)
1837 at java.lang.Object.wait(Native Method)
1838 - waiting on <0x0007855f2010> (a 
org.apache.zookeeper.ClientCnxn$Packet)
1839 at java.lang.Object.wait(Object.java:503)
1840 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1414)
1841 - locked <0x0007855f2010> (a 
org.apache.zookeeper.ClientCnxn$Packet)
1842 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1386)
1843 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677)
1844 - locked <0x00078660daf0> (a org.apache.zookeeper.ZooKeeper)
1845 at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.closeZkClients(ZKRMStateStore.java:360)
1846 - locked <0x000781087940> (a 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore)
1847 at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.closeInternal(ZKRMStateStore.java:382)
1848 - locked <0x000781087940> (a 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore)
1849 at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:486)
1850 at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
1851 - locked <0x0007823b2a80> (a java.lang.Object)
1852 at 
org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
1853 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:549)
1854 at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
1855 - locked <0x0007823d0610> (a java.lang.Object)
1856 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:958)
1857 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1018)
1858 - locked <0x000780059c60> (a 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
1859 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.handleTransitionToStandBy(ResourceManager.java:703)
1860 at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StandByTransitionThread.run(RMStateStore.java:884)
{code}

My analysis is that it is waiting for the packet response
Hope this helps

> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

2015-08-21 Thread Akihiro Suda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706468#comment-14706468
 ] 

Akihiro Suda commented on ZOOKEEPER-2251:
-

Hello, I'm trying to reproduce this bug.

However I'm not sure about the meaning of "only few packets missed".
If some IP packets were lost, TCP layer (in the OS kernel) should be able to 
detect and retransmit lost packets.
So I guess the lost packets are eventually delivered to the client OS, but ZK 
client get hangs due to unexpected delay of the delivered packets.

Could you please share your client workload and its thread dump?

BTW, for partial packet loss simulation, [Linux 
netem|http://www.linuxfoundation.org/collaborate/workgroups/networking/netem#Packet_loss]
 or my [Earthquake 
tool|http://osrg.github.io/earthquake/post/zookeeper-2212/](found 
ZOOKEEPER-2212, and perhaps also reproduced ZOOKEEPER-2080) might be useful. 
(Wireshark is a packet analyzer, and doesn't provide any network simulation 
feature.)


> Add Client side packet response timeout to avoid infinite wait.
> ---
>
> Key: ZOOKEEPER-2251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: nijel
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)