Look into your supervisor logs, with them set to INFO.
Look for something like:
“Shutting down and clearing state for id {worker.id}”
That id will match your worker’s logs.
The reason for the shutdown will be in the supervisor’s logs.
From: SuXingyu <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: 2015,Thursday, May 7 at 02:44
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Storm worker died - Client is being closed, and does not take requests
any more
Hi all,
I am runing a Kafka-Storm cluster in distributed mode. There are four
supervisors in my storm cluster. For most of the time, it works fun. But after
I redeployed topology yesterday, I found all the four workers in storm cluster
restart after running several minutes.
I check the log in storm and find this:
2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed,
and does not take requests any more
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
~[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
Caused by: java.lang.RuntimeException: Client is being closed, and does not
take requests any more
at backtype.storm.messaging.netty.Client.send(Client.java:185)
~[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
~[storm-core-0.9.3.jar:0.9.3]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.9.3.jar:0.9.3]
... 6 common frames omitted
2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")
java.lang.RuntimeException: ("Async loop died!")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
[storm-core-0.9.3.jar:0.9.3]
at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
[storm-core-0.9.3.jar:0.9.3]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
These logs appear several times in about twenty minutes. And the cluter
restored after a reblance operation in Storm ui.
Any idea why this may be happening?
Best regard!
Xingyu