RE: Storm worker died - Client is being closed, and does not take requests any more

SuXingyu Sun, 10 May 2015 21:09:45 -0700

It seems that this issue matches our situation best. I am wondering in what 
condition will the worker be moved to other slots except rebalance? Is moving 
worker  a normal operation of nimbus?


Anyway, sincerely thank you for your help!
 
Date: Fri, 8 May 2015 20:35:50 +0900
Subject: Re: Storm worker died - Client is being closed, and does not take 
requests any more
From: [email protected]
To: [email protected]

If one or more workers were moved to other slots, you may hit this issue.
https://issues.apache.org/jira/browse/STORM-737

Hope this helps.

2015-05-08 18:53 GMT+09:00 SuXingyu <[email protected]>:



I am running my cluster using Storm 0.9.3.  This exception was caused by 
sending msg while Netty client is closing. But still don't know why Netty 
client closed.

 From: [email protected]
Date: Thu, 7 May 2015 23:34:16 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take 
requests any more
To: [email protected]

What version of Storm are you running?   I had significant issues with Storm 
v0.9.3 having what I'll term "congestive collapse" related to the Netty 
implementation.  Workers would continually die.  The problem was resolved by 
switching the transport layer from Netty back to the original ZeroMQ 
implementation, but that was not very trivial.
- Erik
On Thu, May 7, 2015 at 11:28 PM, SuXingyu <[email protected]> wrote:



Yes, I've found the logs you metioned in Nimbus log. But still can't find any 
indication in worker logs. The first log genarated by storm after work start is 
"Async loop died".

From: [email protected]
Date: Thu, 7 May 2015 22:56:41 -0700
Subject: Re: Storm worker died - Client is being closed, and does not take 
requests any more
To: [email protected]

The "State" being ":disallowed" usually indicates that the worker's executors' 
slot assignment has changed.  If you look at the Nimbus logs, I would expect to 
see logs stating that the affected executors were noticed to be "not alive" by 
the Nimbus, and then an immediate "Setting new assignment for topology id" log 
line as the Nimbus creates a new assignment for this topology.  As for why the 
Worker itself died -- is there no indication in the worker logs?
- Erik
On Thu, May 7, 2015 at 8:36 PM, SuXingyu <[email protected]> wrote:



I checked my supervisor log, and found logs like  "Shutting down and clearing 
state for id {worker.id}". But still can't find the cause of worker shutting 
down.

Here are the logs in supervisor. It seems that worker's jvm exit before 
supervisor kill it.

2015-05-06 17:21:47:847 b.s.d.supervisor [INFO] Launching worker with 
assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id 
"message_storm-101-17-1430903865", :executors ...} for this supervisor 
95d77940-08b2-4f7a-9660-03d8f61a2529 on port 6701 with id 
95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:21:47:849 b.s.d.supervisor [INFO] Launching worker with command: 
'/usr/java/bin/java' '-server' '-Xmx8g' '-Xms8g' '-Xmn2g' '-DServer=mblog' 
'-DNODE_POOL_NAME=openapi_storm-yf' '-XX:PermSize=128m' '-XX:MaxPermSize=128m' 
'-XX:MaxTenuringThreshold=4' '-XX:+UseConcMarkSweepGC' '-XX:SurvivorRatio=8' 
'-XX:CMSInitiatingOccupancyFraction=70' '-XX:+ExplicitGCInvokesConcurrent' 
'-XX:+PrintCommandLineFlags' '-XX:+PrintGCDateStamps' 
'-XX:+PrintTenuringDistribution' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' 
'-XX:+PrintGCApplicationStoppedTime' '-XX:+PrintGCApplicationConcurrentTime' 
'-Xloggc:/data0/logs/storm/gc.log' 
'-Djava.library.path=storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources/Linux-amd64:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/resources:/usr/local/lib:/opt/local/lib:/usr/lib'
 '-Dlogfile.name=worker-6701.log' '-Dstorm.home=/data0/storm' 
'-Dstorm.conf.file=' '-Dstorm.options=' '-Dstorm.log.dir=/data0/storm/logs' 
'-Dlogback.configurationFile=/data0/storm/logback/cluster.xml' 
'-Dstorm.id=message_storm-101-17-1430903865' 
'-Dworker.id=95722030-7785-40d6-a47a-cc072c4c9d83' '-Dworker.port=6701' '-cp' 
'/data0/storm/lib/logback-core-1.0.13.jar:/data0/storm/lib/tools.logging-0.2.3.jar:/data0/storm/lib/commons-codec-1.6.jar:/data0/storm/lib/json-simple-1.1.jar:/data0/storm/lib/commons-logging-1.1.3.jar:/data0/storm/lib/tools.cli-0.2.4.jar:/data0/storm/lib/jetty-util-6.1.26.jar:/data0/storm/lib/kryo-2.21.jar:/data0/storm/lib/ring-devel-0.3.11.jar:/data0/storm/lib/commons-lang-2.5.jar:/data0/storm/lib/commons-exec-1.1.jar:/data0/storm/lib/clout-1.0.1.jar:/data0/storm/lib/servlet-api-2.5.jar:/data0/storm/lib/ring-jetty-adapter-0.3.11.jar:/data0/storm/lib/clj-time-0.4.1.jar:/data0/storm/lib/jgrapht-core-0.9.0.jar:/data0/storm/lib/chill-java-0.3.5.jar:/data0/storm/lib/jline-2.11.jar:/data0/storm/lib/tools.macro-0.1.0.jar:/data0/storm/lib/log4j-over-slf4j-1.6.6.jar:/data0/storm/lib/minlog-1.2.jar:/data0/storm/lib/core.incubator-0.1.0.jar:/data0/storm/lib/jetty-6.1.26.jar:/data0/storm/lib/compojure-1.1.3.jar:/data0/storm/lib/math.numeric-tower-0.0.1.jar:/data0/storm/lib/slf4j-api-1.7.5.jar:/data0/storm/lib/hiccup-0.3.6.jar:/data0/storm/lib/objenesis-1.2.jar:/data0/storm/lib/storm-core-0.9.3.jar:/data0/storm/lib/snakeyaml-1.11.jar:/data0/storm/lib/asm-4.0.jar:/data0/storm/lib/carbonite-1.4.0.jar:/data0/storm/lib/commons-fileupload-1.2.1.jar:/data0/storm/lib/ring-core-1.1.5.jar:/data0/storm/lib/logback-classic-1.0.13.jar:/data0/storm/lib/clojure-1.5.1.jar:/data0/storm/lib/reflectasm-1.07-shaded.jar:/data0/storm/lib/ring-servlet-0.3.11.jar:/data0/storm/lib/disruptor-2.10.1.jar:/data0/storm/lib/joda-time-2.0.jar:/data0/storm/lib/commons-io-2.4.jar:/data0/storm/lib/clj-stacktrace-0.2.2.jar:/data0/storm/conf:storm-local/supervisor/stormdist/message_storm-101-17-1430903865/stormjar.jar'
 'backtype.storm.daemon.worker' 'message_storm-101-17-1430903865' 
'95d77940-08b2-4f7a-9660-03d8f61a2529' '6701' 
'95722030-7785-40d6-a47a-cc072c4c9d83'
2015-05-06 17:21:47:850 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:350 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:48:850 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:351 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:49:851 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:352 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:50:852 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:352 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:51:853 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:21:52:353 b.s.d.supervisor [INFO] 
95722030-7785-40d6-a47a-cc072c4c9d83 still hasn't started
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down and clearing 
state for id 95722030-7785-40d6-a47a-cc072c4c9d83. Current supervisor time: 
1430904337. State: :disallowed, Heartbeat: 
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1430904305, :storm-id 
"message_storm-101-17-1430903865", :executors #{...}, :port 6701}
2015-05-06 17:25:37:845 b.s.d.supervisor [INFO] Shutting down 
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83
2015-05-06 17:25:37:847 b.s.util [INFO] Error when trying to kill 27187. 
Process is probably already dead.
2015-05-06 17:25:38:849 b.s.util [INFO] Error when trying to kill 27187. 
Process is probably already dead.
2015-05-06 17:25:38:853 b.s.d.supervisor [INFO] Shut down 
95d77940-08b2-4f7a-9660-03d8f61a2529:95722030-7785-40d6-a47a-cc072c4c9d83




From: [email protected]
To: [email protected]
Subject: Re: Storm worker died - Client is being closed, and does not take 
requests any more
Date: Thu, 7 May 2015 15:15:50 +0000






Look into your supervisor logs, with them set to INFO.



Look for something like:
“Shutting down and clearing state for id {worker.id}”



That id will match your worker’s logs.



The reason for the shutdown will be in the supervisor’s logs.














From: SuXingyu <[email protected]>

Reply-To: "[email protected]" <[email protected]>

Date: 2015,Thursday, May 7 at 02:44

To: "[email protected]" <[email protected]>

Subject: Storm worker died - Client is being closed, and does not take requests 
any more







Hi all,



I am runing a Kafka-Storm cluster in distributed mode. There are four 
supervisors in my storm cluster. For most of the time, it works fun. But after 
I redeployed topology yesterday, I found all the four workers in storm cluster 
restart after running several
 minutes.



I check the log in storm and find this：



2015-05-06 17:23:08:934 b.s.util [ERROR] Async loop died!

java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, 
and does not take requests any more

        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) 
~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) 
~[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

Caused by: java.lang.RuntimeException: Client is being closed, and does not 
take requests any more

        at backtype.storm.messaging.netty.Client.send(Client.java:185) 
~[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330)
 ~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328)
 ~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) 
~[storm-core-0.9.3.jar:0.9.3]

        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.9.3.jar:0.9.3]

        ... 6 common frames omitted

2015-05-06 17:23:08:938 b.s.util [ERROR] Halting process: ("Async loop died!")

java.lang.RuntimeException: ("Async loop died!")

        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) 
[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]

        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) 
[storm-core-0.9.3.jar:0.9.3]

        at backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) 
[storm-core-0.9.3.jar:0.9.3]

        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

2015-05-06 17:23:08:957 b.s.d.worker [INFO] Shutting down worker 
message_storm-101-17-1430903865 95d77940-08b2-4f7a-9660-03d8f61a2529 6700

2015-05-06 17:23:45:487 o.a.s.z.ZooKeeper [INFO] Client 
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT



These logs appear several times in about twenty minutes. And the cluter 
restored after a reblance operation in Storm ui.



Any idea why this may be happening?



Best regard!



Xingyu












                                          

                                          

                                          


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsaviorLinkedIn : 
http://www.linkedin.com/in/heartsavior

RE: Storm worker died - Client is being closed, and does not take requests any more

Reply via email to