Grant Henke has uploaded this change for review. (
http://gerrit.cloudera.org:8080/9655
Change subject: java: reduce scary exception spew and improve logging
......................................................................
java: reduce scary exception spew and improve logging
In real environments there are various types of failures that are
"expected". In particular, it's not unlikely for the client to
occasionally connect to a server which has gone down or which is not the
leader for a given tablet anymore. In that case, we should make sure
that the logging output is easy to read and doesn't include irrelevant
stack traces stemming from Netty internals.
This patch addresses several of those cases. In particular, it gets rid
of "unexpected exception" messages (and associated stacks) in the case
of "connection refused" errors. It also helpfully includes the
underlying error when invalidating the tablet cache for a location.
There's a relatively simple new unit test, and I also ran some
before/after comparisons using the "loadgen" sample from the
kudu-examples repo against a local cluster with one TS:
Case 1: Master down when we start the client:
============================================================
Before the patch
--------------
[New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Error
receiving response from localhost:7051
org.apache.kudu.client.RecoverableException: connection closed
at org.apache.kudu.client.Connection.channelClosed(Connection.java:254)
at
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at org.apache.kudu.client.Connection.handleUpstream(Connection.java:238)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
at
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
at
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
at
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
at
org.apache.kudu.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer
master-localhost:7051(localhost:7051)] unexpected exception from downstream on
[id: 0xf9759051]
java.net.ConnectException: Connection refused: localhost/127.0.0.1:7051
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Unable to
find the leader master localhost:7051; will retry
... repeats with exponential backoff
After the patch
--------------
[New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect
to peer master-localhost:7051(localhost:7051): Connection refused:
localhost/127.0.0.1:7051
[New I/O worker #1] INFO org.apache.kudu.client.ConnectToCluster - Unable to
connect to master localhost:7051: Connection refused: localhost/127.0.0.1:7051
[New I/O worker #1] WARN org.apache.kudu.client.ConnectToCluster - Unable to
find the leader master localhost:7051; will retry
... repeats with exponential backoff
Case 2: Tserver is down
============================================================
Before the patch
----------------
[New I/O worker #11] INFO org.apache.kudu.client.AsyncKuduClient - Removing
server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache
9da164fbcd93404ea8e38e9491eb3fa8
[New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer
fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050)] unexpected exception from
downstream on [id: 0x270e3a82]
java.net.ConnectException: Connection refused: todd-laptop/127.0.1.1:7050
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
... repeats with exponential backoff
After the patch
---------------
[New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect
to peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050): Connection refused:
todd-laptop/127.0.1.1:7050
[New I/O worker #15] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating
location fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050) for tablet
478ddbaafb5b494ea0de8e2d0ed45a00: Connection refused: todd-laptop/127.0.1.1:7050
... repeats with exponential backoff
Case 3: A tablet is bad
============================================================
In this case I manually delete a non-replicated tablet that is being written to
using 'kudu remote_replica delete'. This is unrecoverable since there is
no other replica.
Before the patch
-----------------
[New I/O worker #10] INFO org.apache.kudu.client.AsyncKuduClient - Removing
server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache
a633f4a0c14b4404a8ae1f825b378867
... repeats with exponential backoff
After the patch
----------------
[New I/O worker #9] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating
location 36622c46b25f4da9ada43f8591728053(todd-laptop:7050) for tablet
c0b9488597e8447595ad0ee0a3378f95: Tablet not RUNNING: STOPPED
... repeats with exponential backoff
Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b
Reviewed-on: http://gerrit.cloudera.org:8080/9644
Reviewed-by: Grant Henke <[email protected]>
Tested-by: Grant Henke <[email protected]>
(cherry picked from commit ead756844ce9ada904fcc3666df25692f63e76b8)
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToCluster.java
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
M
java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
5 files changed, 78 insertions(+), 7 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9655/1
--
To view, visit http://gerrit.cloudera.org:8080/9655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: branch-1.7.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b
Gerrit-Change-Number: 9655
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>