Hi,
I increased the netty wait time in storm.yaml and it seemed like the issue
was solved.
storm.messaging.netty.max_retries: 300
storm.messaging.netty.max_wait_ms: 10000
storm.messaging.netty.min_wait_ms: 4000
But I got the same error again after a few hours. So the above fix just
delayed the error by a few hours. Can anyone suggest a permanent fix for
this?
Here are my worker logs at the time of failure -
2014-09-05 15:40:20 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=stormSlave1:2181, sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@57a01ca7
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Opening socket connection to
server stormSlave1/192.168.1.57:2181. Will not attempt to authenticate
using SASL (unknown error)
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Socket connection established
to stormSlave1/192.168.1.57:2181, initiating session
2014-09-05 15:40:20 b.s.d.executor [INFO] Loading executor __system:[-1 -1]
2014-09-05 15:40:20 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-09-05 15:40:20 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=stormSlave1:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@4d49e1ca
2014-09-05 15:40:20 b.s.d.executor [INFO] Loaded executor tasks
__system:[-1 -1]
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Opening socket connection to
server stormSlave1/192.168.1.57:2181. Will not attempt to authenticate
using SASL (unknown error)
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Socket connection established
to stormSlave1/192.168.1.57:2181, initiating session
2014-09-05 15:40:20 b.s.d.executor [INFO] Finished loading executor
__system:[-1 -1]
2014-09-05 15:40:20 b.s.d.worker [INFO] Launching receive-thread for
821e9e5c-f2de-4a5b-892a-b74ad1f6fae6:6700
2014-09-05 15:40:20 b.s.d.executor [INFO] Preparing bolt __system:(-1)
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Session establishment complete
on server stormSlave1/192.168.1.57:2181, sessionid = 0x148448b63260158,
negotiated timeout = 20000
2014-09-05 15:40:20 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-09-05 15:40:20 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-09-05 15:40:20 b.s.d.executor [INFO] Prepared bolt __system:(-1)
2014-09-05 15:40:20 b.s.m.n.Server [INFO] Create Netty Server
Netty-server-localhost-6700, buffer_size: 5242880, maxWorkers: 1
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Session establishment complete
on server stormSlave1/192.168.1.57:2181, sessionid = 0x148448b63260159,
negotiated timeout = 20000
2014-09-05 15:40:20 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-09-05 15:40:20 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-09-05 15:40:20 b.s.m.loader [INFO] Starting receive-thread: [stormId:
popeye-3-1409900360, port: 6700, thread-id: 0 ]
2014-09-05 15:40:20 s.k.DynamicBrokersReader [INFO] Read partition info
from zookeeper:
GlobalPartitionInformation{partitionMap={0=stormMaster:9093,
1=stormMaster:9094}}
2014-09-05 15:40:20 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
2014-09-05 15:40:20 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=stormSlave1:2181 sessionTimeout=20000
watcher=org.apache.curator.ConnectionState@28ebe0a7
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Opening socket connection to
server stormSlave1/192.168.1.57:2181. Will not attempt to authenticate
using SASL (unknown error)
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Socket connection established
to stormSlave1/192.168.1.57:2181, initiating session
2014-09-05 15:40:20 b.s.d.executor [INFO] Opened spout kafkaSpout:(7)
2014-09-05 15:40:20 b.s.d.executor [INFO] Activating spout kafkaSpout:(7)
2014-09-05 15:40:20 s.k.ZkCoordinator [INFO] Task [1/1] Refreshing
partition manager connections
2014-09-05 15:40:20 o.a.z.ClientCnxn [INFO] Session establishment complete
on server stormSlave1/192.168.1.57:2181, sessionid = 0x148448b6326015a,
negotiated timeout = 20000
2014-09-05 15:40:20 o.a.c.f.s.ConnectionStateManager [INFO] State change:
CONNECTED
2014-09-05 15:40:20 o.a.c.f.s.ConnectionStateManager [WARN] There are no
ConnectionStateListeners registered.
2014-09-05 15:40:20 s.k.DynamicBrokersReader [INFO] Read partition info
from zookeeper:
GlobalPartitionInformation{partitionMap={0=stormMaster:9093,
1=stormMaster:9094}}
2014-09-05 15:40:20 s.k.KafkaUtils [INFO] Task [1/1] assigned
[Partition{host=stormMaster:9093, partition=0},
Partition{host=stormMaster:9094, partition=1}]
2014-09-05 15:40:20 s.k.ZkCoordinator [INFO] Task [1/1] Deleted partition
managers: []
2014-09-05 15:40:20 s.k.ZkCoordinator [INFO] Task [1/1] New partition
managers: [Partition{host=stormMaster:9094, partition=1},
Partition{host=stormMaster:9093, partition=0}]
On Thu, Sep 4, 2014 at 9:14 PM, Pavan Jakati G <[email protected]>
wrote:
> Check what worker log has to stay. May be storm user permission issue.
> Try running the command that's there in supervisor log and validate the
> output .
>
>
>
> -----Original Message-----
> From: Palak Shah [mailto:[email protected] <[email protected]>]
> Sent: Thu 9/4/2014 8:54 PM
> To: [email protected]
> Subject: Storm Worker process disconnects
>
> I am running a Topology on Storm Cluster that uses the Kafka Spout. One of
> my supervisor disconnects after a certain amount of time and is restarted
> again. My supervisor logs for that node report the following -
>
> Does anybody have some clue about this?
>
> 2014-09-04 19:10:11 b.s.d.supervisor [INFO] Launching worker with command:
> 'java' '-server' '-verbose:gc' '-XX:+PrintGCTimeStamps'
> '-XX:+PrintGCDetails' '-Dcom.sun.management.jmxremote'
> '-Dcom.sun.management.jmxremote.ssl=false'
> '-Dcom.sun.management.jmxremote.authenticate=false'
> '-Dcom.sun.management.jmxremote.port=16700'
>
> '-Djava.library.path=/tmp/stormtmp/supervisor/stormdist/palak-1-1409837530/resources/Linux-amd64:/tmp/stormtmp/supervisor/stormdist/palak-1-1409837530/resources:/usr/lib/jvm/java-7-oracle/lib'
> '-Dlogfile.name=worker-6700.log'
> '-Dstorm.home=/home/stormcluster/Storm/apache-storm-0.9.2-incubating'
>
> '-Dlogback.configurationFile=/home/stormcluster/Storm/apache-storm-0.9.2-incubating/logback/cluster.xml'
> '-Dstorm.id=palak-1-1409837530'
> '-Dworker.id=88c6e738-cdc8-4b17-9f0d-849205dc9458' '-Dworker.port=6700'
> '-cp'
>
> '/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/conf:/tmp/stormtmp/supervisor/stormdist/palak-1-1409837530/stormjar.jar'
> 'backtype.storm.daemon.worker' 'palak-1-1409837530'
> '2350025d-9305-4481-8f99-23ef0cf3fdad' '6700'
> '88c6e738-cdc8-4b17-9f0d-849205dc9458'
> 2014-09-04 19:10:13 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:13 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:14 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:14 b.s.d.supervisor [INFO] Removing code for storm id
> palak-1-1409837530
> 2014-09-04 19:10:14 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:15 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:15 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:24 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:24 b.s.d.supervisor [INFO] Downloading code for storm id
> palak-1-1409837530 from /tmp/stormtmp/nimbus/stormdist/palak-1-1409837530
> 2014-09-04 19:10:25 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:27 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:28 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:28 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:28 b.s.d.supervisor [INFO] Finished downloading code for
> storm id palak-1-1409837530 from
> /tmp/stormtmp/nimbus/stormdist/palak-1-1409837530
> 2014-09-04 19:10:29 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:29 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:30 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:35 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:36 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:10:36 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:21 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:21 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:22 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:22 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:23 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:23 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:24 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:24 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:25 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:11:25 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:12:10 b.s.d.supervisor [INFO]
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 still hasn't started
> 2014-09-04 19:12:11 b.s.d.supervisor [INFO] Worker
> 88c6e738-cdc8-4b17-9f0d-849205dc9458 failed to start
> 2014-09-04 19:12:11 b.s.d.supervisor [INFO] Shutting down and clearing
> state for id 88c6e738-cdc8-4b17-9f0d-849205dc9458. Current supervisor time:
> 1409838131. State: :not-started, Heartbeat: nil
> 2014-09-04 19:12:11 b.s.d.supervisor [INFO] Shutting down
> 2350025d-9305-4481-8f99-23ef0cf3fdad:88c6e738-cdc8-4b17-9f0d-849205dc9458
> 2014-09-04 19:12:11 b.s.d.supervisor [INFO] Shut down
> 2350025d-9305-4481-8f99-23ef0cf3fdad:88c6e738-cdc8-4b17-9f0d-849205dc9458
> 2014-09-04 19:12:11 b.s.d.supervisor [INFO] Launching worker with
> assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id
> "palak-1-1409837530", :executors ([5 5] [1 1])} for this supervisor
> 2350025d-9305-4481-8f99-23ef0cf3fdad on port 6700 with id
> c391936c-8535-4040-9ad2-241cf46b5efa
> 2014-09-04 19:12:11 b.s.d.supervisor [INFO] Launching worker with command:
> 'java' '-server' '-verbose:gc' '-XX:+PrintGCTimeStamps'
> '-XX:+PrintGCDetails' '-Dcom.sun.management.jmxremote'
> '-Dcom.sun.management.jmxremote.ssl=false'
> '-Dcom.sun.management.jmxremote.authenticate=false'
> '-Dcom.sun.management.jmxremote.port=16700'
>
> '-Djava.library.path=/tmp/stormtmp/supervisor/stormdist/palak-1-1409837530/resources/Linux-amd64:/tmp/stormtmp/supervisor/stormdist/palak-1-1409837530/resources:/usr/lib/jvm/java-7-oracle/lib'
> '-Dlogfile.name=worker-6700.log'
> '-Dstorm.home=/home/stormcluster/Storm/apache-storm-0.9.2-incubating'
>
> '-Dlogback.configurationFile=/home/stormcluster/Storm/apache-storm-0.9.2-incubating/logback/cluster.xml'
> '
> .
> .
> .
>
> The information contained in this transmission may contain privileged
> and confidential information of Microland Limited, including information
> protected by privacy laws. It is intended only for the use of Microland
> Limited. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution, or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>