I checked my supervisor log, and found logs like "Shutting down and clearing
state for id {worker.id}". But still can't find the cause of worker shutting
down.
Here are the logs in supervisor. It seems that worker's jvm exit before
supervisor kill it.
2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
"message_storm-101-17-1430903865", :executors ...} for this supervisor
95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command:
'/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog'
'-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m'
'-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
'-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
'-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
'-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps'
'-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime'
'-Xloggc:/data0/logs/storm/gc.log'
'-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
'-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
'-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
'-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
'-Dstorm.id=message_storm-101-17-1430903865'
'-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp'
'/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
'95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
'95722030-7785-40d6-a47a-cc072c4c9d83'
2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
1430904337. State: :disallowed, Heartbeat:
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id
"message_storm-101-17-1430903865", :executors #{...}, :port 6701}
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
Process is probably already dead.
2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
Process is probably already dead.
2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
From: [email protected]
To: [email protected]
Subject: Re: Storm worker died - Client is being closed, and does not take
requests any more
Date: Thu, 7 May 2015 15:15:50 +0000
Look into your supervisor logs, with them set to INFO.
Look for something like:
“Shutting down and clearing state for id {worker.id}”
That id will match your worker’s logs.
The reason for the shutdown will be in the supervisor’s logs.
From: SuXingyu <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: 2015,Thursday, May 7 at 02:44
To: "[email protected]" <[email protected]>
Subject: Storm worker died - Client is being closed, and does not take requests
any more
Hi all,
I am runing a Kafka-Storm cluster in distributed mode. There are four
supervisors in my storm cluster. For most of the time, it works fun. But after
I redeployed topology yesterday, I found all the four workers in storm cluster
restart after running several
minutes.
I check the log in storm and find this:
2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed,
and does not take requests any more
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
~[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.RuntimeException: Client is being closed, and does not
take requests any more
at backtype.storm.messaging.netty.Client.send(Client.java:185)
~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.9.3.jar:0.9.3]
... 6 common frames omitted
2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")
java.lang.RuntimeException: ("Async loop died!")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
These logs appear several times in about twenty minutes. And the cluter
restored after a reblance operation in Storm ui.
Any idea why this may be happening?
Best regard!
Xingyu