I deleted my storm local dir and zookeeper data/ log dirs and restarted 
zookeeper and storm again and it works fine now.

This is still very weird and I am still investigating the root cause.

Thanks,
Rushabh

From: P. Taylor Goetz [mailto:[email protected]]
Sent: Tuesday, August 05, 2014 10:57 AM
To: [email protected]
Subject: Re: Storm workers not starting because of netty reconnect : [INFO] 
Reconnect started for Netty-Client

I would double check to make sure hostname resolution is working properly on 
all hosts in the cluster, and that there are not any firewall rules that would 
prevent connections on the supervisor ports.

I would also remove any Netty configuration overrides from storm.yaml to allow 
the defaults to take effect - only override the defaults when you need to.

- Taylor


On Aug 4, 2014, at 2:37 PM, Rushabh Shah 
<[email protected]<mailto:[email protected]>> wrote:


Hi,
I have a topology that was deployed on a storm cluster and was running fine 
until I started facing the following issue.

I can see that in supervisor logs, the supervisor is trying to launch the 
topology on a worker but it is not able to start it.



2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with assignment 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] 
[1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6702 
with id f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with command: 
'/integral/opt/jdk16/bin/java' '-server' '-Xmx768m' 
'-Djava.net.preferIPv4Stack=true' '-Djava.net.preferIPv4Stack=true' 
'-Xmanagement:ssl=false,authenticate=false,port=7099' '-Xmx8192m' 
'-Djava.library.path=/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources/Linux-amd64:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
 '-Dlogfile.name=worker-6702.log' 
'-Dstorm.home=/integral/opt/apache-storm-0.9.2-incubating' 
'-Dlogback.configurationFile=/integral/opt/apache-storm-0.9.2-incubating/logback/cluster.xml'
 '-Dstorm.id=SALABPOSITION-5-1-2-1406938773' 
'-Dworker.id=f620ab27-61fd-4b87-b017-dea1e811074b' '-Dworker.port=6702' '-cp' 
'/integral/opt/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/integral/opt/apache-storm-0.9.2-incubating/conf:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/stormjar.jar'
 'backtype.storm.daemon.worker' 'SALABPOSITION-5-1-2-1406938773' 
'48753f4c-e0fd-48f3-a149-1f52491da5b9' '6702' 
'f620ab27-61fd-4b87-b017-dea1e811074b'
2014-08-04 18:27:33 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:33 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:34 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
.....

After 120 seconds the supervisor will timeout and try to start the topology on 
another worker.


2014-08-04 18:29:32 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:32 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Worker 
f620ab27-61fd-4b87-b017-dea1e811074b failed to start
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down and clearing state 
for id f620ab27-61fd-4b87-b017-dea1e811074b. Current supervisor time: 
1407176973. State: :not-started, Heartbeat: nil
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down 
48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shut down 
48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Launching worker with assignment 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] 
[1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6703 
with id c290b2ec-7969-44ca-ac3e-008b8841ef3f


And this process keeps on repeating.


On the worker logs, I see the following :


2014-08-04 08:09:53 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>...
 [14]
2014-08-04 08:09:54 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>...
 [15]
2014-08-04 08:09:55 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>...
 [16]
......
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent 
with 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703<http://netty-client-supervisor2.integral.com/192.168.239.166:6703>...,
 timeout: 600000ms, pendings: 0
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6701<http://netty-client-supervisor2.integral.com/192.168.239.166:6701>...
 [0]
2014-08-04 08:10:10 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, 
and does not take requests any more
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
Caused by: java.lang.RuntimeException: Client is being closed, and does not 
take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:194) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:323)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 6 common frames omitted
2014-08-04 08:10:10 b.s.util [INFO] Halting process: ("Async loop died!")


It seems that the supervisor is not able to communicate with the workers 
because of some netty connection issues.

I would appreciate if somebody can help me in this regard.

Thanks,
Rushabh

Reply via email to