If one or more workers were moved to other slots, you may hit this issue.

https://issues.apache.org/jira/browse/STORM-737

Hope this helps.


2015-05-08 18:53 GMT+09:00 SuXingyu <[email protected]>:

> I am running my cluster using Storm 0.9.3.  This exception was caused by
> sending msg while Netty client is closing. But still don't know why Netty
> client closed.
>
> ------------------------------
>  From: [email protected]
> Date: Thu, 7 May 2015 23:34:16 -0700
>
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: [email protected]
>
> What version of Storm are you running?   I had significant issues with
> Storm v0.9.3 having what I'll term "congestive collapse" related to the
> Netty implementation.  Workers would continually die.  The problem was
> resolved by switching the transport layer from Netty back to the original
> ZeroMQ implementation, but that was not very trivial.
>
> - Erik
>
> On Thu, May 7, 2015 at 11:28 PM, SuXingyu <[email protected]> wrote:
>
> Yes, I've found the logs you metioned in Nimbus log. But still can't find
> any indication in worker logs. The first log genarated by storm after work
> start is "Async loop died".
>
> ------------------------------
> From: [email protected]
> Date: Thu, 7 May 2015 22:56:41 -0700
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> To: [email protected]
>
>
> The "State" being ":disallowed" usually indicates that the worker's
> executors' slot assignment has changed.  If you look at the Nimbus logs, I
> would expect to see logs stating that the affected executors were noticed
> to be "not alive" by the Nimbus, and then an immediate "Setting new
> assignment for topology id" log line as the Nimbus creates a new assignment
> for this topology.  As for why the Worker itself died -- is there no
> indication in the worker logs?
>
> - Erik
>
> On Thu, May 7, 2015 at 8:36 PM, SuXingyu <[email protected]> wrote:
>
> I checked my supervisor log, and found logs like  "Shutting down and
> clearing state for id {worker.id}". But still can't find the cause of
> worker shutting down.
>
> Here are the logs in supervisor. It seems that worker's jvm exit before
> supervisor kill it.
>
> 2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "message_storm-101-17-1430903865", :executors ...} for this supervisor
> 95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id
> 95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with
> command: '/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g'
> '-DServer=mblog' '-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m'
> '-XX:MaxPermSize=128m' '-XX:MaxTenuringThreshold=4'
> '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8'
> '-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent'
> '-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps'
> '-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails'
> '-XX:+PrintGCTimeStamps' '-XX:+PrintGCApplicationStoppedTime'
> '-XX:+PrintGCApplicationConcurrentTime' '-Xloggc:/data0/logs/storm/gc.log'
> '-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
> '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm'
> '-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs'
> '-Dlogback.configurationFile=/data0/storm/logback/cluster.xml'
> '-Dstorm.id=message_storm-101-17-1430903865'
> '-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701'
> '-cp'
> '/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
> 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865'
> '95d77940-08b2-4f7a-9660-03d8f61a2529' '6701'
> '95722030-7785-40d6-a47a-cc072c4c9d83'
> 2015-05-06 17:21:47:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:350 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:48:850 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:351 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:49:851 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:50:852 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:352 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:51:853 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:21:52:353 b.s.d.supervisor [INFO]
> 95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time:
> 1430904337. State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305,
> :storm-id "message_storm-101-17-1430903865", :executors #{...}, :port 6701}
> 2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
> 2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187.
> Process is probably already dead.
> 2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down
> 95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
>
>
>
>
> ------------------------------
> From: [email protected]
> To: [email protected]
> Subject: Re: Storm worker died - Client is being closed, and does not take
> requests any more
> Date: Thu, 7 May 2015 15:15:50 +0000
>
>
> Look into your supervisor logs, with them set to INFO.
>
>  Look for something like:
> “Shutting down and clearing state for id {worker.id}”
>
>  That id will match your worker’s logs.
>
>  The reason for the shutdown will be in the supervisor’s logs.
>
>
>
>
>   From: SuXingyu <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: 2015,Thursday, May 7 at 02:44
> To: "[email protected]" <[email protected]>
> Subject: Storm worker died - Client is being closed, and does not take
> requests any more
>
>   Hi all,
>
> I am runing a Kafka-Storm cluster in distributed mode. There are four
> supervisors in my storm cluster. For most of the time, it works fun. But
> after I redeployed topology yesterday, I found all the four workers in
> storm cluster restart after running several minutes.
>
> I check the log in storm and find this:
>
> 2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: Client is being
> closed, and does not take requests any more
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.lang.RuntimeException: Client is being closed, and does
> not take requests any more
>         at backtype.storm.messaging.netty.Client.send(Client.java:185)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58)
> ~[storm-core-0.9.3.jar:0.9.3]
>         at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> ~[storm-core-0.9.3.jar:0.9.3]
>         ... 6 common frames omitted
> 2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop
> died!")
> java.lang.RuntimeException: ("Async loop died!")
>         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
> [clojure-1.5.1.jar:na]
>         at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92)
> [storm-core-0.9.3.jar:0.9.3]
>         at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473)
> [storm-core-0.9.3.jar:0.9.3]
>         at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> 2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker
> message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700
> 2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> These logs appear several times in about twenty minutes. And the cluter
> restored after a reblance operation in Storm ui.
>
> Any idea why this may be happening?
>
> Best regard!
>
> Xingyu
>
>
>
>
>
>
>


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Reply via email to