If one or more workers were moved to other slots, you may hit this issue. https://issues.apache.org/jira/browse/STORM-737
Hope this helps. 2015-05-08 18:53 GMT+09:00 SuXingyu <[email protected]>: > I am running my cluster using Storm 0.9.3. This exception was caused by > sending msg while Netty client is closing. But still don't know why Netty > client closed. > > ------------------------------ > From: [email protected] > Date: Thu, 7 May 2015 23:34:16 -0700 > > Subject: Re: Storm worker died - Client is being closed, and does not take > requests any more > To: [email protected] > > What version of Storm are you running? I had significant issues with > Storm v0.9.3 having what I'll term "congestive collapse" related to the > Netty implementation. Workers would continually die. The problem was > resolved by switching the transport layer from Netty back to the original > ZeroMQ implementation, but that was not very trivial. > > - Erik > > On Thu, May 7, 2015 at 11:28 PM, SuXingyu <[email protected]> wrote: > > Yes, I've found the logs you metioned in Nimbus log. But still can't find > any indication in worker logs. The first log genarated by storm after work > start is "Async loop died". > > ------------------------------ > From: [email protected] > Date: Thu, 7 May 2015 22:56:41 -0700 > Subject: Re: Storm worker died - Client is being closed, and does not take > requests any more > To: [email protected] > > > The "State" being ":disallowed" usually indicates that the worker's > executors' slot assignment has changed. If you look at the Nimbus logs, I > would expect to see logs stating that the affected executors were noticed > to be "not alive" by the Nimbus, and then an immediate "Setting new > assignment for topology id" log line as the Nimbus creates a new assignment > for this topology. As for why the Worker itself died -- is there no > indication in the worker logs? > > - Erik > > On Thu, May 7, 2015 at 8:36 PM, SuXingyu <[email protected]> wrote: > > I checked my supervisor log, and found logs like "Shutting down and > clearing state for id {worker.id}". But still can't find the cause of > worker shutting down. > > Here are the logs in supervisor. It seems that worker's jvm exit before > supervisor kill it. > > 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with > assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id > "message_storm-101-17-1430903865", :executors ...} for this supervisor > 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id > 95722030-7785-40d6-a47a-cc072c4c9d83 > 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with > command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' > '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' > '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4' > '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' > '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' > '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' > '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' > '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime' > '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log' > '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib' > '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' > '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' > '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' > '-Dstorm.id=message_storm-101-17-1430903865' > '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' > '-cp' > '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar' > 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' > '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' > '95722030-7785-40d6-a47a-cc072c4c9d83' > 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] > 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started > 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing > state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: > 1430904337. State: :disallowed, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, > :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701} > 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down > 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83 > 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. > Process is probably already dead. > 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. > Process is probably already dead. > 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down > 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83 > > > > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: Re: Storm worker died - Client is being closed, and does not take > requests any more > Date: Thu, 7 May 2015 15:15:50 +0000 > > > Look into your supervisor logs, with them set to INFO. > > Look for something like: > “Shutting down and clearing state for id {worker.id}” > > That id will match your worker’s logs. > > The reason for the shutdown will be in the supervisor’s logs. > > > > > From: SuXingyu <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: 2015,Thursday, May 7 at 02:44 > To: "[email protected]" <[email protected]> > Subject: Storm worker died - Client is being closed, and does not take > requests any more > > Hi all, > > I am runing a Kafka-Storm cluster in distributed mode. There are four > supervisors in my storm cluster. For most of the time, it works fun. But > after I redeployed topology yesterday, I found all the four workers in > storm cluster restart after running several minutes. > > I check the log in storm and find this: > > 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died! > java.lang.RuntimeException: java.lang.RuntimeException: Client is being > closed, and does not take requests any more > at > backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) > ~[storm-core-0.9.3.jar:0.9.3] > at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) > ~[storm-core-0.9.3.jar:0.9.3] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > Caused by: java.lang.RuntimeException: Client is being closed, and does > not take requests any more > at backtype.storm.messaging.netty.Client.send(Client.java:185) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) > ~[storm-core-0.9.3.jar:0.9.3] > at > backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) > ~[storm-core-0.9.3.jar:0.9.3] > ... 6 common frames omitted > 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop > died!") > java.lang.RuntimeException: ("Async loop died!") > at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) > [storm-core-0.9.3.jar:0.9.3] > at clojure.lang.RestFn.invoke(RestFn.java:423) > [clojure-1.5.1.jar:na] > at > backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) > [storm-core-0.9.3.jar:0.9.3] > at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) > [storm-core-0.9.3.jar:0.9.3] > at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker > message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700 > 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client > environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT > > These logs appear several times in about twenty minutes. And the cluter > restored after a reblance operation in Storm ui. > > Any idea why this may be happening? > > Best regard! > > Xingyu > > > > > > > -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
