Derek,

We are currently running with -Xmx60G and only about 20-30G of that has been 
observed to be used. I'm still observing workers restarted every 2 minutes.

What timeout is relevant to increase for the heartbeats in question? Is it be a 
config on the Zookeeper side we can increase to make our topology more 
resilient to these restarts?

Michael

> Date: Fri, 23 May 2014 15:50:50 -0500
> From: der...@yahoo-inc.com
> To: user@storm.incubator.apache.org
> Subject: Re: Workers constantly restarted due to session timeout
> 
> > 2) Is this expected behavior for Storm to be unable to keep up with 
> > heartbeat threads under high CPU or is our theory incorrect?
> 
> Check your JVM max heap size (-Xmx).  If you use too much, the JVM will 
> garbage-collect, and that will stop everything--including the thread whose 
> job it is to do the heartbeating.
> 
> 
> 
> -- 
> Derek
> 
> On 5/23/14, 15:38, Michael Dev wrote:
> > Hi all,
> >
> > We are seeing our workers constantly being killed by Storm with to the 
> > following logs:
> > worker: 2014-05-23 20:15:08 INFO ClientCxn:1157 - Client session timed out, 
> > have not heard from the server in 28105ms for sessionid 0x14619bf2f4e0109, 
> > closing socket and attempting reconnect
> > supervisor: 2014-05-23 20:17:30 INFO supervisor:0 - Shutting down and 
> > clearing state for id 94349373-74ec-484b-a9f8-a5076e17d474. Current 
> > supervisor time: 1400876250. State: :disallowed, Heartbeat: 
> > #backtype.storm.daemon.common.WorkerHeartbeat{{:time-secs 1400876249, 
> > :storm-id "test-46-1400863199", :executors #{[-1 -1]}, :port 6700}
> >
> > Eventually Storm decides to just kill the worker and restart it as you see 
> > in the supervisor log. We theorize this is the Zookeeper heartbeat thread 
> > and it is being choked out due to very high CPU load on the machine (near 
> > 100%).
> >
> > I have increased the connection timeouts in the storm.yaml config file yet 
> > Storm seems to continue to use some unknown value for the above client 
> > session timeout messages:
> > storm.zookeeper.connection.timeout: 300000
> > storm.zookeeper.session.timeout: 300000
> >
> > 1) What timeout config is appropriate for the above timeout  message?
> > 2) Is this expected behavior for Storm to be unable to keep up with 
> > heartbeat threads under high CPU or is our theory incorrect?
> >
> > Thanks,
> > Michael
> >                                     
> >

                                          

Reply via email to