Hey All, I've been running into network issues on the Amazon network over
the last couple of days.  This has exposed an unexpected behavior in my
storm topology.  It looks like Netty exceptions can kill a workers async
loop and bring it down.  Is this a bug? Is there a configuration setting I
can use to change this behavior?  I'm seeing exceptions like this:

2014-01-31 16:56:35 STDIO [ERROR] Jan 31, 2014 4:56:35 PM
org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an
exception event ([id: 0x2a528847] EXCEPTION: java.net.ConnectException:
Connection refused)
java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method)
at backtype.storm.messaging.netty.Client.reconnect(Client.java:78)
at
backtype.storm.messaging.netty.StormClientHandler.exceptionCaught(StormClientHandler.java:108)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
2014-01-31 16:56:42 b.s.m.n.Client [WARN] Remote address is not reachable.
We will close this client.
2014-01-31 16:56:43 c.g.s.c.s.LwesSpout [WARN] Event queue depth: count:
10000, total: 4419, [min, average, max]=[0, 0.44, 41]
2014-01-31 16:56:45 STDIO [ERROR] Jan 31, 2014 4:56:45 PM
org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by a user handler while handling an
exception event ([id: 0x2b41f372] EXCEPTION: java.net.ConnectException:
Connection refused)
java.lang.IllegalArgumentException: timeout value is negative
at java.lang.Thread.sleep(Native Method)
at backtype.storm.messaging.netty.Client.reconnect(Client.java:78)
at
backtype.storm.messaging.netty.StormClientHandler.exceptionCaught(StormClientHandler.java:108)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
2014-01-31 16:56:45 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being
closed, and does not take requests any more
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:90)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:61)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__2975.invoke(disruptor.clj:74)
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.util$async_loop$fn__444.invoke(util.clj:403)
~[storm-core-0.9.0.1.jar:na]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_07]
Caused by: java.lang.RuntimeException: Client is being closed, and does not
take requests any more
at backtype.storm.messaging.netty.Client.send(Client.java:109)
~[storm-netty-0.9.0.1.jar:na]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5867$fn__5868.invoke(worker.clj:304)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5867.invoke(worker.clj:293)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.disruptor$clojure_handler$reify__2962.onEvent(disruptor.clj:43)
~[storm-core-0.9.0.1.jar:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87)
~[storm-core-0.9.0.1.jar:na]

Reply via email to