Yes, I've found the logs you metioned in Nimbus log. But still can't find any indication in worker logs. The first log genarated by storm after work start is "Async loop died".
From: [email protected] Date: Thu, 7 May 2015 22:56:41 -0700 Subject: Re: Storm worker died - Client is being closed, and does not take requests any more To: [email protected] The "State" being ":disallowed" usually indicates that the worker's executors' slot assignment has changed. If you look at the Nimbus logs, I would expect to see logs stating that the affected executors were noticed to be "not alive" by the Nimbus, and then an immediate "Setting new assignment for topology id" log line as the Nimbus creates a new assignment for this topology. As for why the Worker itself died -- is there no indication in the worker logs? - Erik On Thu, May 7, 2015 at 8:36 PM, SuXingyu <[email protected]> wrote: I checked my supervisor log, and found logs like "Shutting down and clearing state for id {worker.id}". But still can't find the cause of worker shutting down. Here are the logs in supervisor. It seems that worker's jvm exit before supervisor kill it. 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "message_storm-101-17-1430903865", :executors ...} for this supervisor 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id 95722030-7785-40d6-a47a-cc072c4c9d83 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log' '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' '-Dstorm.id=message_storm-101-17-1430903865' '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp' '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar' 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' '95722030-7785-40d6-a47a-cc072c4c9d83' 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: 1430904337. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701} 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead. 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. Process is probably already dead. 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83 From: [email protected] To: [email protected] Subject: Re: Storm worker died - Client is being closed, and does not take requests any more Date: Thu, 7 May 2015 15:15:50 +0000 Look into your supervisor logs, with them set to INFO. Look for something like: “Shutting down and clearing state for id {worker.id}” That id will match your worker’s logs. The reason for the shutdown will be in the supervisor’s logs. From: SuXingyu <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: 2015,Thursday, May 7 at 02:44 To: "[email protected]" <[email protected]> Subject: Storm worker died - Client is being closed, and does not take requests any more Hi all, I am runing a Kafka-Storm cluster in distributed mode. There are four supervisors in my storm cluster. For most of the time, it works fun. But after I redeployed topology yesterday, I found all the four workers in storm cluster restart after running several minutes. I check the log in storm and find this: 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died! java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) ~[storm-core-0.9.3.jar:0.9.3] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more at backtype.storm.messaging.netty.Client.send(Client.java:185) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.jar:0.9.3] at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.3.jar:0.9.3] ... 6 common frames omitted 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!") java.lang.RuntimeException: ("Async loop died!") at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3] at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na] at backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) [storm-core-0.9.3.jar:0.9.3] at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) [storm-core-0.9.3.jar:0.9.3] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT These logs appear several times in about twenty minutes. And the cluter restored after a reblance operation in Storm ui. Any idea why this may be happening? Best regard! Xingyu
