[jira] [Commented] (ZOOKEEPER-2910) zookeeper listening on unknown random port
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214154#comment-16214154 ] ken huang commented on ZOOKEEPER-2910: -- As you say, you running three servers on the single machine(17/18/19), there should be 9 ports listened on the single machine as followed: {code:java} for zk17: 2181 3181 4181 for zk18: 2182 3182 4182 for zk19: 2183 3183 4183 {code} But i just fond zk19 is fine, zk18 and zk17 missing the port used for followers use to connect to the leader. I'm not sure whether my understanding is correct. > zookeeper listening on unknown random port > -- > > Key: ZOOKEEPER-2910 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2910 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.8 > Environment: CentOS 7 > Zookeeper 3.4.8 > JDK 1.8.0_111 >Reporter: ken huang >Assignee: Patrick Hunt > > when zookeeper start, it will listen on three port: > 2181 for client connnect > 3888 for leader election > random for what ? > three are three port config in zoo.cfg, 2181、2888、3888, but no 2888 listened > on. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
ZooKeeper-trunk - Build # 3586 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk/3586/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 62.46 MB...] [junit] at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [junit] at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [junit] at java.lang.Thread.run(Thread.java:745) [junit] 2017-10-21 23:30:47,333 [myid:] - INFO [main:ZooKeeper@1332] - Session: 0x1022decb9c9 closed [junit] 2017-10-21 23:30:47,333 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 149849 [junit] 2017-10-21 23:30:47,333 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 1648 [junit] 2017-10-21 23:30:47,333 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD testWatcherAutoResetWithLocal [junit] 2017-10-21 23:30:47,333 [myid:] - INFO [main:ClientBase@601] - tearDown starting [junit] 2017-10-21 23:30:47,333 [myid:] - INFO [main:ClientBase@571] - STOPPING server [junit] 2017-10-21 23:30:47,334 [myid:] - INFO [main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468 [junit] 2017-10-21 23:30:47,339 [myid:] - INFO [main:ZooKeeperServer@545] - shutting down [junit] 2017-10-21 23:30:47,339 [myid:] - ERROR [main:ZooKeeperServer@509] - ZKShutdownHandler is not registered, so ZooKeeper server won't take any action on ERROR or SHUTDOWN server state changes [junit] 2017-10-21 23:30:47,339 [myid:] - INFO [main:SessionTrackerImpl@232] - Shutting down [junit] 2017-10-21 23:30:47,339 [myid:] - INFO [main:PrepRequestProcessor@1010] - Shutting down [junit] 2017-10-21 23:30:47,339 [myid:] - INFO [main:SyncRequestProcessor@191] - Shutting down [junit] 2017-10-21 23:30:47,339 [myid:] - INFO [ProcessThread(sid:0 cport:11468)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop! [junit] 2017-10-21 23:30:47,339 [myid:] - INFO [SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited! [junit] 2017-10-21 23:30:47,340 [myid:] - INFO [main:FinalRequestProcessor@481] - shutdown of request processor complete [junit] 2017-10-21 23:30:47,340 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11468,name1=InMemoryDataTree] [junit] 2017-10-21 23:30:47,340 [myid:] - INFO [main:MBeanRegistry@128] - Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11468] [junit] 2017-10-21 23:30:47,340 [myid:] - INFO [main:FourLetterWordMain@87] - connecting to 127.0.0.1 11468 [junit] 2017-10-21 23:30:47,340 [myid:] - INFO [main:JMXEnv@146] - ensureOnly:[] [junit] 2017-10-21 23:30:47,344 [myid:] - INFO [main:ClientBase@626] - fdcount after test is: 4844 at start it was 4844 [junit] 2017-10-21 23:30:47,344 [myid:] - INFO [main:ZKTestCase$1@68] - SUCCEEDED testWatcherAutoResetWithLocal [junit] 2017-10-21 23:30:47,352 [myid:] - INFO [SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop! [junit] 2017-10-21 23:30:47,352 [myid:] - INFO [SessionTracker:SessionTrackerImpl@158] - SessionTrackerImpl exited loop! [junit] 2017-10-21 23:30:47,411 [myid:] - INFO [main:ZKTestCase$1@63] - FINISHED testWatcherAutoResetWithLocal [junit] Tests run: 105, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 500.802 sec, Thread: 1, Class: org.apache.zookeeper.test.NioNettySuiteTest fail.build.on.test.failure: BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1386: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1267: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1271: Tests failed! Total time: 13 minutes 1 second Build step 'Execute shell' marked build as failure [FINDBUGS] Skipping publisher since build result is FAILURE [WARNINGS] Skipping publisher since build result is FAILURE Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording fingerprints Setting
ZooKeeper-trunk-openjdk7 - Build # 1674 - Still Failing
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1674/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 65.51 MB...] [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-10-21 20:07:33,249 [myid:127.0.0.1:11234] - INFO [main-SendThread(127.0.0.1:11234):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11234. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 20:07:33,250 [myid:127.0.0.1:11234] - INFO [main-SendThread(127.0.0.1:11234):ClientCnxn$SendThread@946] - Socket connection established, initiating session, client: /127.0.0.1:40979, server: 127.0.0.1/127.0.0.1:11234 [junit] 2017-10-21 20:07:33,251 [myid:4] - INFO [NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:11234:NIOServerCnxnFactory$AcceptThread@296] - Accepted socket connection from /127.0.0.1:40979 [junit] 2017-10-21 20:07:33,773 [myid:127.0.0.1:11225] - INFO [main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11225. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 20:07:33,774 [myid:127.0.0.1:11225] - WARN [main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1235] - Session 0x1042f7af8670001 for server 127.0.0.1/127.0.0.1:11225, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-10-21 20:07:34,393 [myid:127.0.0.1:11225] - INFO [main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11225. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 20:07:34,395 [myid:127.0.0.1:11225] - WARN [main-SendThread(127.0.0.1:11225):ClientCnxn$SendThread@1235] - Session 0x1042f7af867 for server 127.0.0.1/127.0.0.1:11225, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-10-21 20:07:34,428 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 20:07:34,429 [myid:127.0.0.1:11222] - WARN [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 0x42f7af589 for server 127.0.0.1/127.0.0.1:11222, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214) [junit] 2017-10-21 20:07:34,770 [myid:127.0.0.1:11222] - INFO [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 20:07:34,771 [myid:127.0.0.1:11222] - WARN [main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 0x42f7af5890001 for server 127.0.0.1/127.0.0.1:11222, unexpected error, closing socket connection and attempting reconnect [junit] java.net.ConnectException: Connection refused [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357) [junit] at
[jira] [Commented] (ZOOKEEPER-2913) testEphemeralNodeDeletion is flakey
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214026#comment-16214026 ] Michelle Tan commented on ZOOKEEPER-2913: - I tried to run the test on my machine, the test seem to work most of the time. There's one or two times, the test failed as ephemeral node not deleted... At the moment, I don't see it appearing the Flaky Dashboard anymore. I also tried to use zk client api to delete the ephemeral node and run the test, but it said session expired. I'm not quite sure how to delete an ephemeral node... Please advise on how should I proceed. > testEphemeralNodeDeletion is flakey > --- > > Key: ZOOKEEPER-2913 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2913 > Project: ZooKeeper > Issue Type: Bug > Components: tests >Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.6.0 >Reporter: Patrick Hunt >Assignee: Michelle Tan > Labels: newbie > Fix For: 3.5.4, 3.6.0, 3.4.12 > > > testEphemeralNodeDeletion is showing up as flakey across a number of jobs. > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html > After session close ephemeral node must be deleted -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213980#comment-16213980 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r146108372 --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java --- @@ -148,9 +173,47 @@ synchronized public void resetRequestCounters(){ packetsReceived = 0; packetsSent = 0; } +synchronized public void resetNumRequestsAboveThresholdTime() { +numRequestsAboveThresholdTime = 0; +} synchronized public void reset() { resetLatency(); resetRequestCounters(); +resetNumRequestsAboveThresholdTime(); +} + +public void checkLatency(final ZooKeeperServer zks, Request request) { +long requestLatency = Time.currentElapsedTime() - request.createTime; +boolean enabledAndAboveThreshold = (requestWarnThresholdMs == 0) || +(requestWarnThresholdMs > -1 && requestLatency > requestWarnThresholdMs); +if (enabledAndAboveThreshold) { +zks.serverStats().incNumRequestsAboveThresholdTime(); +// Try acquiring lock only if not waiting +if(!waitForLoggingWarnThresholdMsg) { --- End diff -- nit: space between `if` and `(` > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r146108372 --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java --- @@ -148,9 +173,47 @@ synchronized public void resetRequestCounters(){ packetsReceived = 0; packetsSent = 0; } +synchronized public void resetNumRequestsAboveThresholdTime() { +numRequestsAboveThresholdTime = 0; +} synchronized public void reset() { resetLatency(); resetRequestCounters(); +resetNumRequestsAboveThresholdTime(); +} + +public void checkLatency(final ZooKeeperServer zks, Request request) { +long requestLatency = Time.currentElapsedTime() - request.createTime; +boolean enabledAndAboveThreshold = (requestWarnThresholdMs == 0) || +(requestWarnThresholdMs > -1 && requestLatency > requestWarnThresholdMs); +if (enabledAndAboveThreshold) { +zks.serverStats().incNumRequestsAboveThresholdTime(); +// Try acquiring lock only if not waiting +if(!waitForLoggingWarnThresholdMsg) { --- End diff -- nit: space between `if` and `(` ---
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213978#comment-16213978 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r146108343 --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java --- @@ -21,17 +21,32 @@ import org.apache.zookeeper.common.Time; +import org.apache.zookeeper.server.quorum.QuorumPeerConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Timer; +import java.util.TimerTask; /** * Basic Server Statistics */ public class ServerStats { +private static final Logger LOG = LoggerFactory.getLogger(ServerStats.class); + private long packetsSent; private long packetsReceived; private long maxLatency; private long minLatency = Long.MAX_VALUE; private long totalLatency = 0; private long count = 0; +private long numRequestsAboveThresholdTime = 0; + +final static long requestWarnThresholdMs = QuorumPeerConfig.getRequestWarnResponseThresholdMs(); +final static Timer timer = new Timer(); +volatile Boolean waitForLoggingWarnThresholdMsg = false; --- End diff -- Oh, excuse me, I got it. It's because you are synchronizing on it, right? Well, it's not recommended (anti-pattern) to synchronize on non-final fields. Didn't Findbugs raised a warning about this? > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r146108343 --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java --- @@ -21,17 +21,32 @@ import org.apache.zookeeper.common.Time; +import org.apache.zookeeper.server.quorum.QuorumPeerConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Timer; +import java.util.TimerTask; /** * Basic Server Statistics */ public class ServerStats { +private static final Logger LOG = LoggerFactory.getLogger(ServerStats.class); + private long packetsSent; private long packetsReceived; private long maxLatency; private long minLatency = Long.MAX_VALUE; private long totalLatency = 0; private long count = 0; +private long numRequestsAboveThresholdTime = 0; + +final static long requestWarnThresholdMs = QuorumPeerConfig.getRequestWarnResponseThresholdMs(); +final static Timer timer = new Timer(); +volatile Boolean waitForLoggingWarnThresholdMsg = false; --- End diff -- Oh, excuse me, I got it. It's because you are synchronizing on it, right? Well, it's not recommended (anti-pattern) to synchronize on non-final fields. Didn't Findbugs raised a warning about this? ---
[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213976#comment-16213976 ] ASF GitHub Bot commented on ZOOKEEPER-2770: --- Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r146108249 --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java --- @@ -21,17 +21,32 @@ import org.apache.zookeeper.common.Time; +import org.apache.zookeeper.server.quorum.QuorumPeerConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Timer; +import java.util.TimerTask; /** * Basic Server Statistics */ public class ServerStats { +private static final Logger LOG = LoggerFactory.getLogger(ServerStats.class); + private long packetsSent; private long packetsReceived; private long maxLatency; private long minLatency = Long.MAX_VALUE; private long totalLatency = 0; private long count = 0; +private long numRequestsAboveThresholdTime = 0; + +final static long requestWarnThresholdMs = QuorumPeerConfig.getRequestWarnResponseThresholdMs(); +final static Timer timer = new Timer(); +volatile Boolean waitForLoggingWarnThresholdMsg = false; --- End diff -- What is the reason to make `waitForLoggingWarnThresholdMsg` a boxed type instead of a primitive `boolean`? > ZooKeeper slow operation log > > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Karan Mehta >Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, > ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why > any given read or write operation may become slow: a software bug, a protocol > problem, a hardware issue with the commit log(s), a network issue. If the > problem is constant it is trivial to come to an understanding of the cause. > However in order to diagnose intermittent problems we often don't know where, > or when, to begin looking. We need some sort of timestamped indication of the > problem. Although ZooKeeper is not a datastore, it does persist data, and can > suffer intermittent performance degradation, and should consider implementing > a 'slow query' log, a feature very common to services which persist > information on behalf of clients which may be sensitive to latency while > waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally > processing the request, that the current time minus arrival time of the > request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log
Github user eribeiro commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/307#discussion_r146108249 --- Diff: src/java/main/org/apache/zookeeper/server/ServerStats.java --- @@ -21,17 +21,32 @@ import org.apache.zookeeper.common.Time; +import org.apache.zookeeper.server.quorum.QuorumPeerConfig; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Timer; +import java.util.TimerTask; /** * Basic Server Statistics */ public class ServerStats { +private static final Logger LOG = LoggerFactory.getLogger(ServerStats.class); + private long packetsSent; private long packetsReceived; private long maxLatency; private long minLatency = Long.MAX_VALUE; private long totalLatency = 0; private long count = 0; +private long numRequestsAboveThresholdTime = 0; + +final static long requestWarnThresholdMs = QuorumPeerConfig.getRequestWarnResponseThresholdMs(); +final static Timer timer = new Timer(); +volatile Boolean waitForLoggingWarnThresholdMsg = false; --- End diff -- What is the reason to make `waitForLoggingWarnThresholdMsg` a boxed type instead of a primitive `boolean`? ---
[jira] [Comment Edited] (ZOOKEEPER-2921) fsyncWarningThresholdMS is applied on each getChannel().force() - also needed on entire commit
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213971#comment-16213971 ] Edward Ribeiro edited comment on ZOOKEEPER-2921 at 10/21/17 3:34 PM: - Hi [~randgalt] and [~phunt] {quote} That said having such a warning when request latency exceeds some threshold would also be useful. {quote} Isn't that what ZOOKEEPER-2770 is all about? Or I misunderstood this issue? There's a corresponding stale patch already: https://github.com/apache/zookeeper/pull/307 Wdyt? Please, let me know. Cheers! was (Author: eribeiro): Hi [~randgalt] and [~phunt] :smile: {quote} That said having such a warning when request latency exceeds some threshold would also be useful. {quote} Isn't that what ZOOKEEPER-2770 is all about? Or I misunderstood this issue? There's a corresponding stale patch already: https://github.com/apache/zookeeper/pull/307 Wdyt? Please, let me know. Cheers! > fsyncWarningThresholdMS is applied on each getChannel().force() - also needed > on entire commit > -- > > Key: ZOOKEEPER-2921 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2921 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.3 >Reporter: Jordan Zimmerman >Priority: Minor > > FileTxnLog.commit() has a warning when an individual sync takes longer than > {{fsyncWarningThresholdMS}}. However, it would also be useful to warn when > the entire commit operation takes longer than {{fsyncWarningThresholdMS}} as > this can cause client connection failures. Currently, commit() can take > longer than 2/3 of a session but still not log a warning. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2921) fsyncWarningThresholdMS is applied on each getChannel().force() - also needed on entire commit
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213971#comment-16213971 ] Edward Ribeiro commented on ZOOKEEPER-2921: --- Hi [~randgalt] and [~phunt] :smile: {quote} That said having such a warning when request latency exceeds some threshold would also be useful. {quote} Isn't that what ZOOKEEPER-2770 is all about? Or I misunderstood this issue? There's a corresponding stale patch already: https://github.com/apache/zookeeper/pull/307 Wdyt? Please, let me know. Cheers! > fsyncWarningThresholdMS is applied on each getChannel().force() - also needed > on entire commit > -- > > Key: ZOOKEEPER-2921 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2921 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.3 >Reporter: Jordan Zimmerman >Priority: Minor > > FileTxnLog.commit() has a warning when an individual sync takes longer than > {{fsyncWarningThresholdMS}}. However, it would also be useful to warn when > the entire commit operation takes longer than {{fsyncWarningThresholdMS}} as > this can cause client connection failures. Currently, commit() can take > longer than 2/3 of a session but still not log a warning. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
ZooKeeper_branch35_jdk8 - Build # 724 - Failure
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/724/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 68.35 MB...] [junit] 2017-10-21 12:09:49,111 [myid:127.0.0.1:16737] - INFO [main-SendThread(127.0.0.1:16737):ClientCnxn$SendThread@1231] - channel for sessionid 0x30884ab5217 is lost, closing socket connection and attempting reconnect [junit] 2017-10-21 12:09:49,552 [myid:127.0.0.1:16657] - INFO [main-SendThread(127.0.0.1:16657):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:16657. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 12:09:49,556 [myid:] - WARN [New I/O boss #1341:ClientCnxnSocketNetty$ZKClientHandler@439] - Exception caught: [id: 0x4858c95a] EXCEPTION: java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:16657 [junit] java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:16657 [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [junit] at java.lang.Thread.run(Thread.java:748) [junit] 2017-10-21 12:09:49,556 [myid:] - INFO [New I/O boss #1341:ClientCnxnSocketNetty@208] - channel is told closing [junit] 2017-10-21 12:09:49,556 [myid:127.0.0.1:16657] - INFO [main-SendThread(127.0.0.1:16657):ClientCnxnSocketNetty$1@127] - future isn't success, cause: {} [junit] java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:16657 [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [junit] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [junit] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [junit] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [junit] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [junit] at java.lang.Thread.run(Thread.java:748) [junit] 2017-10-21 12:09:49,557 [myid:127.0.0.1:16657] - INFO [main-SendThread(127.0.0.1:16657):ClientCnxn$SendThread@1231] - channel for sessionid 0x10884a852a1 is lost, closing socket connection and attempting reconnect [junit] 2017-10-21 12:09:49,591 [myid:127.0.0.1:16672] - INFO [main-SendThread(127.0.0.1:16672):ClientCnxn$SendThread@1113] - Opening socket connection to server 127.0.0.1/127.0.0.1:16672. Will not attempt to authenticate using SASL (unknown error) [junit] 2017-10-21 12:09:49,592 [myid:] - INFO [New I/O boss #1971:ClientCnxnSocketNetty$1@127] - future isn't success, cause: {} [junit] java.net.ConnectException: Connection refused: 127.0.0.1/127.0.0.1:16672 [junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [junit] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [junit] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [junit] at