Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9655 )

Change subject: java: reduce scary exception spew and improve logging
......................................................................

java: reduce scary exception spew and improve logging

In real environments there are various types of failures that are
"expected". In particular, it's not unlikely for the client to
occasionally connect to a server which has gone down or which is not the
leader for a given tablet anymore. In that case, we should make sure
that the logging output is easy to read and doesn't include irrelevant
stack traces stemming from Netty internals.

This patch addresses several of those cases. In particular, it gets rid
of "unexpected exception" messages (and associated stacks) in the case
of "connection refused" errors. It also helpfully includes the
underlying error when invalidating the tablet cache for a location.

There's a relatively simple new unit test, and I also ran some
before/after comparisons using the "loadgen" sample from the
kudu-examples repo against a local cluster with one TS:

Case 1: Master down when we start the client:
============================================================

Before the patch
--------------
[New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Error 
receiving response from localhost:7051
org.apache.kudu.client.RecoverableException: connection closed
        at org.apache.kudu.client.Connection.channelClosed(Connection.java:254)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
        at org.apache.kudu.client.Connection.handleUpstream(Connection.java:238)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at 
org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.channelClosed(ReadTimeoutHandler.java:176)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
        at 
org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.Channels$6.run(Channels.java:457)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer 
master-localhost:7051(localhost:7051)] unexpected exception from downstream on 
[id: 0xf9759051]
java.net.ConnectException: Connection refused: localhost/127.0.0.1:7051
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[New I/O worker #12] WARN org.apache.kudu.client.ConnectToCluster - Unable to 
find the leader master localhost:7051; will retry

... repeats with exponential backoff

After the patch
--------------
[New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect 
to peer master-localhost:7051(localhost:7051): Connection refused: 
localhost/127.0.0.1:7051
[New I/O worker #1] INFO org.apache.kudu.client.ConnectToCluster - Unable to 
connect to master localhost:7051: Connection refused: localhost/127.0.0.1:7051
[New I/O worker #1] WARN org.apache.kudu.client.ConnectToCluster - Unable to 
find the leader master localhost:7051; will retry

... repeats with exponential backoff

Case 2: Tserver is down
============================================================

Before the patch
----------------
[New I/O worker #11] INFO org.apache.kudu.client.AsyncKuduClient - Removing 
server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache 
9da164fbcd93404ea8e38e9491eb3fa8
[New I/O boss #17] ERROR org.apache.kudu.client.Connection - [peer 
fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050)] unexpected exception from 
downstream on [id: 0x270e3a82]
java.net.ConnectException: Connection refused: todd-laptop/127.0.1.1:7050
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at 
org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at 
org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

... repeats with exponential backoff

After the patch
---------------
[New I/O boss #17] INFO org.apache.kudu.client.Connection - Failed to connect 
to peer fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050): Connection refused: 
todd-laptop/127.0.1.1:7050
[New I/O worker #15] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating 
location fc9f6ab955cc47a8ab3653d0170305b0(todd-laptop:7050) for tablet 
478ddbaafb5b494ea0de8e2d0ed45a00: Connection refused: todd-laptop/127.0.1.1:7050

... repeats with exponential backoff

Case 3: A tablet is bad
============================================================
In this case I manually delete a non-replicated tablet that is being written to
using 'kudu remote_replica delete'. This is unrecoverable since there is
no other replica.

Before the patch
-----------------
[New I/O worker #10] INFO org.apache.kudu.client.AsyncKuduClient - Removing 
server fc9f6ab955cc47a8ab3653d0170305b0 from this tablet's cache 
a633f4a0c14b4404a8ae1f825b378867

... repeats with exponential backoff

After the patch
----------------
[New I/O worker #9] INFO org.apache.kudu.client.AsyncKuduClient - Invalidating 
location 36622c46b25f4da9ada43f8591728053(todd-laptop:7050) for tablet 
c0b9488597e8447595ad0ee0a3378f95: Tablet not RUNNING: STOPPED

... repeats with exponential backoff

Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b
Reviewed-on: http://gerrit.cloudera.org:8080/9644
Reviewed-by: Grant Henke <[email protected]>
Tested-by: Grant Henke <[email protected]>
(cherry picked from commit ead756844ce9ada904fcc3666df25692f63e76b8)
Reviewed-on: http://gerrit.cloudera.org:8080/9655
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToCluster.java
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
M 
java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
5 files changed, 78 insertions(+), 7 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/9655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.7.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I4b8be871693fecc1ee46a4e238dd2ed8f0730d4b
Gerrit-Change-Number: 9655
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to