[jira] [Commented] (ZOOKEEPER-2910) zookeeper listening on unknown random port

2017-10-21 Thread ken huang (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214154#comment-16214154
 ] 

ken huang commented on ZOOKEEPER-2910:
--

As you say, you running three servers on the single machine(17/18/19), there 
should be 9 ports listened on the single machine as followed:
{code:java}
for zk17: 2181 3181 4181
for zk18: 2182 3182 4182
for zk19: 2183 3183 4183
{code}
But i just fond zk19 is fine, zk18 and zk17 missing the port used for followers 
use to connect to the leader. 
I'm not sure whether my understanding is correct.


> zookeeper listening on unknown random port
> --
>
> Key: ZOOKEEPER-2910
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2910
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.8
> Environment: CentOS 7
> Zookeeper 3.4.8
> JDK 1.8.0_111
>Reporter: ken huang
>Assignee: Patrick Hunt
>
> when zookeeper start, it will listen on three port:
> 2181 for client connnect
> 3888 for leader election
> random for what ?
> three are three port config in zoo.cfg, 2181、2888、3888, but no 2888 listened 
> on. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper-trunk - Build # 3586 - Still Failing

2017-10-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/3586/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.46 MB...]
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
[junit] at 
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit] at java.lang.Thread.run(Thread.java:745)
[junit] 2017-10-21 23:30:47,333 [myid:] - INFO  [main:ZooKeeper@1332] - 
Session: 0x1022decb9c9 closed
[junit] 2017-10-21 23:30:47,333 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 149849
[junit] 2017-10-21 23:30:47,333 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 1648
[junit] 2017-10-21 23:30:47,333 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-10-21 23:30:47,333 [myid:] - INFO  [main:ClientBase@601] - 
tearDown starting
[junit] 2017-10-21 23:30:47,333 [myid:] - INFO  [main:ClientBase@571] - 
STOPPING server
[junit] 2017-10-21 23:30:47,334 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468
[junit] 2017-10-21 23:30:47,339 [myid:] - INFO  [main:ZooKeeperServer@545] 
- shutting down
[junit] 2017-10-21 23:30:47,339 [myid:] - ERROR [main:ZooKeeperServer@509] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-10-21 23:30:47,339 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-10-21 23:30:47,339 [myid:] - INFO  
[main:PrepRequestProcessor@1010] - Shutting down
[junit] 2017-10-21 23:30:47,339 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-10-21 23:30:47,339 [myid:] - INFO  [ProcessThread(sid:0 
cport:11468)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop!
[junit] 2017-10-21 23:30:47,339 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-10-21 23:30:47,340 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-10-21 23:30:47,340 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port11468,name1=InMemoryDataTree]
[junit] 2017-10-21 23:30:47,340 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11468]
[junit] 2017-10-21 23:30:47,340 [myid:] - INFO  
[main:FourLetterWordMain@87] - connecting to 127.0.0.1 11468
[junit] 2017-10-21 23:30:47,340 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2017-10-21 23:30:47,344 [myid:] - INFO  [main:ClientBase@626] - 
fdcount after test is: 4844 at start it was 4844
[junit] 2017-10-21 23:30:47,344 [myid:] - INFO  [main:ZKTestCase$1@68] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2017-10-21 23:30:47,352 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop!
[junit] 2017-10-21 23:30:47,352 [myid:] - INFO  
[SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop!
[junit] 2017-10-21 23:30:47,411 [myid:] - INFO  [main:ZKTestCase$1@63] - 
FINISHED testWatcherAutoResetWithLocal
[junit] Tests run: 105, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
500.802 sec, Thread: 1, Class: org.apache.zookeeper.test.NioNettySuiteTest

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1386: The 
following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1267: The 
following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1271: Tests 
failed!

Total time: 13 minutes 1 second
Build step 'Execute shell' marked build as failure
[FINDBUGS] Skipping publisher since build result is FAILURE
[WARNINGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording fingerprints
Setting 

ZooKeeper-trunk-openjdk7 - Build # 1674 - Still Failing

2017-10-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1674/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 65.51 MB...]
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-10-21 20:07:33,249 [myid:127.0.0.1:11234] - INFO  
[main-SendThread(127.0.0.1:11234):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11234. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 20:07:33,250 [myid:127.0.0.1:11234] - INFO  
[main-SendThread(127.0.0.1:11234):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:40979, server: 
127.0.0.1/127.0.0.1:11234
[junit] 2017-10-21 20:07:33,251 [myid:4] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:11234:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:40979
[junit] 2017-10-21 20:07:33,773 [myid:127.0.0.1:11225] - INFO  
[main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11225. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 20:07:33,774 [myid:127.0.0.1:11225] - WARN  
[main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1235] - Session 
0x1042f7af8670001 for server 127.0.0.1/127.0.0.1:11225, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-10-21 20:07:34,393 [myid:127.0.0.1:11225] - INFO  
[main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11225. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 20:07:34,395 [myid:127.0.0.1:11225] - WARN  
[main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1235] - Session 
0x1042f7af867 for server 127.0.0.1/127.0.0.1:11225, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-10-21 20:07:34,428 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 20:07:34,429 [myid:127.0.0.1:11222] - WARN  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 
0x42f7af589 for server 127.0.0.1/127.0.0.1:11222, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-10-21 20:07:34,770 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 20:07:34,771 [myid:127.0.0.1:11222] - WARN  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 
0x42f7af5890001 for server 127.0.0.1/127.0.0.1:11222, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 

[jira] [Commented] (ZOOKEEPER-2913) testEphemeralNodeDeletion is flakey

2017-10-21 Thread Michelle Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214026#comment-16214026
 ] 

Michelle Tan commented on ZOOKEEPER-2913:
-

I tried to run the test on my machine, the test seem to work most of the time. 
There's one or two times, the test failed as ephemeral node not deleted... At 
the moment, I don't see it appearing the Flaky Dashboard anymore. I also tried 
to use zk client api to delete the ephemeral node and run the test, but it said 
session expired. I'm not quite sure how to delete an ephemeral node... Please 
advise on how should I proceed.

> testEphemeralNodeDeletion is flakey
> ---
>
> Key: ZOOKEEPER-2913
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2913
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.6.0
>Reporter: Patrick Hunt
>Assignee: Michelle Tan
>  Labels: newbie
> Fix For: 3.5.4, 3.6.0, 3.4.12
>
>
> testEphemeralNodeDeletion is showing up as flakey across a number of jobs.
> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
> After session close ephemeral node must be deleted



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213980#comment-16213980
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r146108372
  
--- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
@@ -148,9 +173,47 @@ synchronized public void resetRequestCounters(){
 packetsReceived = 0;
 packetsSent = 0;
 }
+synchronized public void resetNumRequestsAboveThresholdTime() {
+numRequestsAboveThresholdTime = 0;
+}
 synchronized public void reset() {
 resetLatency();
 resetRequestCounters();
+resetNumRequestsAboveThresholdTime();
+}
+
+public void checkLatency(final ZooKeeperServer zks, Request request) {
+long requestLatency = Time.currentElapsedTime() - 
request.createTime;
+boolean enabledAndAboveThreshold = (requestWarnThresholdMs == 0) ||
+(requestWarnThresholdMs > -1 && requestLatency > 
requestWarnThresholdMs);
+if (enabledAndAboveThreshold) {
+zks.serverStats().incNumRequestsAboveThresholdTime();
+// Try acquiring lock only if not waiting
+if(!waitForLoggingWarnThresholdMsg) {
--- End diff --

nit: space between `if` and `(`


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-10-21 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r146108372
  
--- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
@@ -148,9 +173,47 @@ synchronized public void resetRequestCounters(){
 packetsReceived = 0;
 packetsSent = 0;
 }
+synchronized public void resetNumRequestsAboveThresholdTime() {
+numRequestsAboveThresholdTime = 0;
+}
 synchronized public void reset() {
 resetLatency();
 resetRequestCounters();
+resetNumRequestsAboveThresholdTime();
+}
+
+public void checkLatency(final ZooKeeperServer zks, Request request) {
+long requestLatency = Time.currentElapsedTime() - 
request.createTime;
+boolean enabledAndAboveThreshold = (requestWarnThresholdMs == 0) ||
+(requestWarnThresholdMs > -1 && requestLatency > 
requestWarnThresholdMs);
+if (enabledAndAboveThreshold) {
+zks.serverStats().incNumRequestsAboveThresholdTime();
+// Try acquiring lock only if not waiting
+if(!waitForLoggingWarnThresholdMsg) {
--- End diff --

nit: space between `if` and `(`


---


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213978#comment-16213978
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r146108343
  
--- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
@@ -21,17 +21,32 @@
 
 
 import org.apache.zookeeper.common.Time;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Timer;
+import java.util.TimerTask;
 
 /**
  * Basic Server Statistics
  */
 public class ServerStats {
+private static final Logger LOG = 
LoggerFactory.getLogger(ServerStats.class);
+
 private long packetsSent;
 private long packetsReceived;
 private long maxLatency;
 private long minLatency = Long.MAX_VALUE;
 private long totalLatency = 0;
 private long count = 0;
+private long numRequestsAboveThresholdTime = 0;
+
+final static long requestWarnThresholdMs = 
QuorumPeerConfig.getRequestWarnResponseThresholdMs();
+final static Timer timer = new Timer();
+volatile Boolean waitForLoggingWarnThresholdMsg = false;
--- End diff --

Oh, excuse me, I got it. It's because you are synchronizing on it, right? 
Well, it's not recommended (anti-pattern) to synchronize on non-final fields. 
Didn't Findbugs raised a warning about this?


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-10-21 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r146108343
  
--- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
@@ -21,17 +21,32 @@
 
 
 import org.apache.zookeeper.common.Time;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Timer;
+import java.util.TimerTask;
 
 /**
  * Basic Server Statistics
  */
 public class ServerStats {
+private static final Logger LOG = 
LoggerFactory.getLogger(ServerStats.class);
+
 private long packetsSent;
 private long packetsReceived;
 private long maxLatency;
 private long minLatency = Long.MAX_VALUE;
 private long totalLatency = 0;
 private long count = 0;
+private long numRequestsAboveThresholdTime = 0;
+
+final static long requestWarnThresholdMs = 
QuorumPeerConfig.getRequestWarnResponseThresholdMs();
+final static Timer timer = new Timer();
+volatile Boolean waitForLoggingWarnThresholdMsg = false;
--- End diff --

Oh, excuse me, I got it. It's because you are synchronizing on it, right? 
Well, it's not recommended (anti-pattern) to synchronize on non-final fields. 
Didn't Findbugs raised a warning about this?


---


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213976#comment-16213976
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r146108249
  
--- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
@@ -21,17 +21,32 @@
 
 
 import org.apache.zookeeper.common.Time;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Timer;
+import java.util.TimerTask;
 
 /**
  * Basic Server Statistics
  */
 public class ServerStats {
+private static final Logger LOG = 
LoggerFactory.getLogger(ServerStats.class);
+
 private long packetsSent;
 private long packetsReceived;
 private long maxLatency;
 private long minLatency = Long.MAX_VALUE;
 private long totalLatency = 0;
 private long count = 0;
+private long numRequestsAboveThresholdTime = 0;
+
+final static long requestWarnThresholdMs = 
QuorumPeerConfig.getRequestWarnResponseThresholdMs();
+final static Timer timer = new Timer();
+volatile Boolean waitForLoggingWarnThresholdMsg = false;
--- End diff --

What is the reason to make `waitForLoggingWarnThresholdMsg` a boxed type 
instead of a primitive `boolean`?


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-10-21 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r146108249
  
--- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java ---
@@ -21,17 +21,32 @@
 
 
 import org.apache.zookeeper.common.Time;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Timer;
+import java.util.TimerTask;
 
 /**
  * Basic Server Statistics
  */
 public class ServerStats {
+private static final Logger LOG = 
LoggerFactory.getLogger(ServerStats.class);
+
 private long packetsSent;
 private long packetsReceived;
 private long maxLatency;
 private long minLatency = Long.MAX_VALUE;
 private long totalLatency = 0;
 private long count = 0;
+private long numRequestsAboveThresholdTime = 0;
+
+final static long requestWarnThresholdMs = 
QuorumPeerConfig.getRequestWarnResponseThresholdMs();
+final static Timer timer = new Timer();
+volatile Boolean waitForLoggingWarnThresholdMsg = false;
--- End diff --

What is the reason to make `waitForLoggingWarnThresholdMsg` a boxed type 
instead of a primitive `boolean`?


---


[jira] [Comment Edited] (ZOOKEEPER-2921) fsyncWarningThresholdMS is applied on each getChannel().force() - also needed on entire commit

2017-10-21 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213971#comment-16213971
 ] 

Edward Ribeiro edited comment on ZOOKEEPER-2921 at 10/21/17 3:34 PM:
-

Hi [~randgalt] and [~phunt] 

{quote}
That said having such a warning when request latency exceeds some threshold 
would also be useful.
{quote}

Isn't that what ZOOKEEPER-2770 is all about? Or I misunderstood this issue? 
There's a corresponding stale patch already: 
https://github.com/apache/zookeeper/pull/307

Wdyt? Please, let me know.

Cheers!



was (Author: eribeiro):
Hi [~randgalt] and [~phunt] :smile:

{quote}
That said having such a warning when request latency exceeds some threshold 
would also be useful.
{quote}

Isn't that what ZOOKEEPER-2770 is all about? Or I misunderstood this issue? 
There's a corresponding stale patch already: 
https://github.com/apache/zookeeper/pull/307

Wdyt? Please, let me know.

Cheers!


> fsyncWarningThresholdMS is applied on each getChannel().force() - also needed 
> on entire commit
> --
>
> Key: ZOOKEEPER-2921
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2921
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Jordan Zimmerman
>Priority: Minor
>
> FileTxnLog.commit() has a warning when an individual sync takes longer than 
> {{fsyncWarningThresholdMS}}. However, it would also be useful to warn when 
> the entire commit operation takes longer than {{fsyncWarningThresholdMS}} as 
> this can cause client connection failures. Currently, commit() can take 
> longer than 2/3 of a session but still not log a warning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2921) fsyncWarningThresholdMS is applied on each getChannel().force() - also needed on entire commit

2017-10-21 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213971#comment-16213971
 ] 

Edward Ribeiro commented on ZOOKEEPER-2921:
---

Hi [~randgalt] and [~phunt] :smile:

{quote}
That said having such a warning when request latency exceeds some threshold 
would also be useful.
{quote}

Isn't that what ZOOKEEPER-2770 is all about? Or I misunderstood this issue? 
There's a corresponding stale patch already: 
https://github.com/apache/zookeeper/pull/307

Wdyt? Please, let me know.

Cheers!


> fsyncWarningThresholdMS is applied on each getChannel().force() - also needed 
> on entire commit
> --
>
> Key: ZOOKEEPER-2921
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2921
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Jordan Zimmerman
>Priority: Minor
>
> FileTxnLog.commit() has a warning when an individual sync takes longer than 
> {{fsyncWarningThresholdMS}}. However, it would also be useful to warn when 
> the entire commit operation takes longer than {{fsyncWarningThresholdMS}} as 
> this can cause client connection failures. Currently, commit() can take 
> longer than 2/3 of a session but still not log a warning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper_branch35_jdk8 - Build # 724 - Failure

2017-10-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/724/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 68.35 MB...]
[junit] 2017-10-21 12:09:49,111 [myid:127.0.0.1:16737] - INFO  
[main-SendThread(127.0.0.1:16737):ClientCnxn$SendThread@1231] - channel for 
sessionid 0x30884ab5217 is lost, closing socket connection and attempting 
reconnect
[junit] 2017-10-21 12:09:49,552 [myid:127.0.0.1:16657] - INFO  
[main-SendThread(127.0.0.1:16657):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:16657. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 12:09:49,556 [myid:] - WARN  [New I/O boss 
#1341:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 
0x4858c95a] EXCEPTION: java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:16657
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:16657
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit] at java.lang.Thread.run(Thread.java:748)
[junit] 2017-10-21 12:09:49,556 [myid:] - INFO  [New I/O boss 
#1341:ClientCnxnSocketNetty@208] - channel is told closing
[junit] 2017-10-21 12:09:49,556 [myid:127.0.0.1:16657] - INFO  
[main-SendThread(127.0.0.1:16657):ClientCnxnSocketNetty$1@127] - future isn't 
success, cause: {}
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:16657
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
[junit] at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[junit] at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[junit] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit] at java.lang.Thread.run(Thread.java:748)
[junit] 2017-10-21 12:09:49,557 [myid:127.0.0.1:16657] - INFO  
[main-SendThread(127.0.0.1:16657):ClientCnxn$SendThread@1231] - channel for 
sessionid 0x10884a852a1 is lost, closing socket connection and attempting 
reconnect
[junit] 2017-10-21 12:09:49,591 [myid:127.0.0.1:16672] - INFO  
[main-SendThread(127.0.0.1:16672):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:16672. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-10-21 12:09:49,592 [myid:] - INFO  [New I/O boss 
#1971:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {}
[junit] java.net.ConnectException: Connection refused: 
127.0.0.1/127.0.0.1:16672
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
[junit] at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
[junit] at