The "State" being ":disallowed" usually indicates that the worker's
executors' slot assignment has changed.  If you look at the Nimbus logs, I
would expect to see logs stating that the affected executors were noticed
to be "not alive" by the Nimbus, and then an immediate "Setting new
assignment for topology id" log line as the Nimbus creates a new assignment
for this topology.  As for why the Worker itself died -- is there no
indication in the worker logs?

- Erik

On Thu, May 7, 2015 at 8:36 PM, SuXingyu <[email protected]> wrote:

> I checked my supervisor log, and found logs like  "Shutting down and
> clearing state for id {worker.id}". But still can't find the cause of
> worker shutting down.
>
> Here are the logs in supervisor. It seems that worker's jvm exit before
> supervisor kill it.
>
> 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "message_storm-101-17-1430903865", :executors ...} for this supervisor
> 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
> 95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with
> command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g'
> '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m'
> '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4'
> '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
> '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
> '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails'
> '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime'
> '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log'
> '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
> '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
> '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
> '-Dstorm.id=message_storm-101-17-1430903865'
> '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701'
> '-cp'
> '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
> 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
> '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
> '95722030-7785-40d6-a47a-cc072c4c9d83'
> 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
> 1430904337. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305,
> :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
>
>
>
>
> ------------------------------
> From: [email protected]
> To: [email protected]
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> Date: Thu, 7 May 2015 15:15:50 +0000
>
>
> Look into your supervisor logs, with them set to INFO.
>
>  Look for something like:
> “Shutting down and clearing state for id {worker.id}”
>
>  That id will match your worker’s logs.
>
>  The reason for the shutdown will be in the supervisor’s logs.
>
>
>
>
>   From: SuXingyu <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: 2015,Thursday, May 7 at 02:44
> To: "[email protected]" <[email protected]>
> Subject: Storm worker died - Client is being closed, and does not take
> requests any more
>
>   Hi all,
>
> I am runing a Kafka-Storm cluster in distributed mode. There are four
> supervisors in my storm cluster. For most of the time, it works fun. But
> after I redeployed topology yesterday, I found all the four workers in
> storm cluster restart after running several minutes.
>
> I check the log in storm and find this:
>
> 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:185)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.3.jar:0.9.3]
>         ... 6 common frames omitted
> 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop
> died!")
> java.lang.RuntimeException: ("Async loop died!")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
> [clojure-1.5.1.jar:na]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
> [storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
> message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
> 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> These logs appear several times in about twenty minutes. And the cluter
> restored after a reblance operation in Storm ui.
>
> Any idea why this may be happening?
>
> Best regard!
>
> Xingyu
>
>
>
>
>

Reply via email to