I understand there all several threads on this topic, but wasn't able to
find a definitive solution and so here I go.
I'm using Apache Storm 0.9.1-incubating. I'm also using Trident to fetch
messages from 0.8 kafka cluster using net.wurstmeister.storm:storm-kafka
-0.8-plus:0.4.0.
The cluster when run on local mode works well being able to retrieve
messages from kafka and does the data processing. When run on remote
cluster mode using JZMQ, it works pretty well too. However, I'm unable to
get it to work using Netty. I have been running with the following configs
for Netty.
storm.messaging.netty.buffer_size 5242880
storm.messaging.netty.client_worker_threads 1
storm.messaging.netty.max_retries 5
storm.messaging.netty.max_wait_ms 1000
storm.messaging.netty.min_wait_ms 100
storm.messaging.netty.server_worker_threads 1
storm.messaging.transport
backtype.storm.messaging.netty.Context
Here is the exception I see in the worker node:
2014-04-17 07:01:03 b.s.d.executor [INFO] Processing received message
source: __system:-1, stream: __tick, id: {}, [5]
2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [1]
2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [2]
2014-04-17 07:01:04 b.s.m.n.Client [INFO] Reconnect ... [3]
2014-04-17 07:01:05 b.s.m.n.Client [INFO] Reconnect ... [4]
*2014-04-17 07:01:06 b.s.m.n.Client [INFO] Reconnect ... [5]*
*2014-04-17 07:01:06 b.s.m.n.Client [WARN] Remote address is not reachable.
We will close this client.*
.
.
*2014-04-17 07:01:23 b.s.util [ERROR] Async loop died!*
*java.lang.RuntimeException: java.lang.RuntimeException: Client is being
closed, and does not take requests any more*
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:107)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:78)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:77)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__6954.invoke(disruptor.clj:89)
~[na:na]
at backtype.storm.util$async_loop$fn__5761.invoke(util.clj:433)
~[na:na]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.lang.RuntimeException: Client is being closed, and does not
take requests any more
at backtype.storm.messaging.netty.Client.send(Client.java:125)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__9966$fn__9967.invoke(worker.clj:319)
~[na:na]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__9966.invoke(worker.clj:308)
~[na:na]
at
backtype.storm.disruptor$clojure_handler$reify__6937.onEvent(disruptor.clj:58)
~[na:na]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:104)
~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
... 6 common frames omitted
2014-04-17 07:01:23 b.s.util [INFO] Halting process: ("Async loop died!")
*Things I have already tried with no luck:*
- The connectivity to Zookeeper is good. I cleared off all the states
before a run and can see Zookeeper updated with the right storms,
supervisor states etc.
- The topology names and stream names are unique
- Tried a lower value for storm.messaging.netty.max_retries = 5
- Nimbus thrift port 6627 is open and accessible.
*Question:*
Where does Netty try to connect for me to get "*b.s.m.n.Client [WARN]
Remote address is not reachable. We will close this client."*
I would greatly appreciate if you folks can point me to the right direction.
Thanks,
Prasanna