Hi,
I have a topology that was deployed on a storm cluster and was running fine 
until I started facing the following issue.

I can see that in supervisor logs, the supervisor is trying to launch the 
topology on a worker but it is not able to start it.



2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with assignment 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] 
[1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6702 
with id f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:27:33 b.s.d.supervisor [INFO] Launching worker with command: 
'/integral/opt/jdk16/bin/java' '-server' '-Xmx768m' 
'-Djava.net.preferIPv4Stack=true' '-Djava.net.preferIPv4Stack=true' 
'-Xmanagement:ssl=false,authenticate=false,port=7099' '-Xmx8192m' 
'-Djava.library.path=/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources/Linux-amd64:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
 '-Dlogfile.name=worker-6702.log' 
'-Dstorm.home=/integral/opt/apache-storm-0.9.2-incubating' 
'-Dlogback.configurationFile=/integral/opt/apache-storm-0.9.2-incubating/logback/cluster.xml'
 '-Dstorm.id=SALABPOSITION-5-1-2-1406938773' 
'-Dworker.id=f620ab27-61fd-4b87-b017-dea1e811074b' '-Dworker.port=6702' '-cp' 
'/integral/opt/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/integral/opt/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/integral/opt/apache-storm-0.9.2-incubating/conf:/app/storm/supervisor/stormdist/SALABPOSITION-5-1-2-1406938773/stormjar.jar'
 'backtype.storm.daemon.worker' 'SALABPOSITION-5-1-2-1406938773' 
'48753f4c-e0fd-48f3-a149-1f52491da5b9' '6702' 
'f620ab27-61fd-4b87-b017-dea1e811074b'
2014-08-04 18:27:33 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:33 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:27:34 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
.....

After 120 seconds the supervisor will timeout and try to start the topology on 
another worker.


2014-08-04 18:29:32 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:32 b.s.d.supervisor [INFO] 
f620ab27-61fd-4b87-b017-dea1e811074b still hasn't started
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Worker 
f620ab27-61fd-4b87-b017-dea1e811074b failed to start
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down and clearing state 
for id f620ab27-61fd-4b87-b017-dea1e811074b. Current supervisor time: 
1407176973. State: :not-started, Heartbeat: nil
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shutting down 
48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Shut down 
48753f4c-e0fd-48f3-a149-1f52491da5b9:f620ab27-61fd-4b87-b017-dea1e811074b
2014-08-04 18:29:33 b.s.d.supervisor [INFO] Launching worker with assignment 
#backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"SALABPOSITION-5-1-2-1406938773", :executors ([3 3] [5 5] [7 7] [9 9] [11 11] 
[1 1])} for this supervisor 48753f4c-e0fd-48f3-a149-1f52491da5b9 on port 6703 
with id c290b2ec-7969-44ca-ac3e-008b8841ef3f


And this process keeps on repeating.


On the worker logs, I see the following :


2014-08-04 08:09:53 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [14]
2014-08-04 08:09:54 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [15]
2014-08-04 08:09:55 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703... [16]
......
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-supervisor2.integral.com/192.168.239.166:6703
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent 
with Netty-Client-supervisor2.integral.com/192.168.239.166:6703..., timeout: 
600000ms, pendings: 0
2014-08-04 08:10:10 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor2.integral.com/192.168.239.166:6701... [0]
2014-08-04 08:10:10 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, 
and does not take requests any more
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]
Caused by: java.lang.RuntimeException: Client is being closed, and does not 
take requests any more
        at backtype.storm.messaging.netty.Client.send(Client.java:194) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:323)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) 
~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 6 common frames omitted
2014-08-04 08:10:10 b.s.util [INFO] Halting process: ("Async loop died!")


It seems that the supervisor is not able to communicate with the workers 
because of some netty connection issues.

I would appreciate if somebody can help me in this regard.

Thanks,
Rushabh



Reply via email to