Can you share your storm config and version?

> On May 29, 2014, at 12:45 PM, Michael Dev <michael_...@outlook.com> wrote:
> 
> Derek,
> 
> We are currently running with -Xmx60G and only about 20-30G of that has been 
> observed to be used. I'm still observing workers restarted every 2 minutes.
> 
> What timeout is relevant to increase for the heartbeats in question? Is it be 
> a config on the Zookeeper side we can increase to make our topology more 
> resilient to these restarts?
> 
> Michael
> 
> > Date: Fri, 23 May 2014 15:50:50 -0500
> > From: der...@yahoo-inc.com
> > To: user@storm.incubator.apache.org
> > Subject: Re: Workers constantly restarted due to session timeout
> > 
> > > 2) Is this expected behavior for Storm to be unable to keep up with 
> > > heartbeat threads under high CPU or is our theory incorrect?
> > 
> > Check your JVM max heap size (-Xmx). If you use too much, the JVM will 
> > garbage-collect, and that will stop everything--including the thread whose 
> > job it is to do the heartbeating.
> > 
> > 
> > 
> > -- 
> > Derek
> > 
> > On 5/23/14, 15:38, Michael Dev wrote:
> > > Hi all,
> > >
> > > We are seeing our workers constantly being killed by Storm with to the 
> > > following logs:
> > > worker: 2014-05-23 20:15:08 INFO ClientCxn:1157 - Client session timed 
> > > out, have not heard from the server in 28105ms for sessionid 
> > > 0x14619bf2f4e0109, closing socket and attempting reconnect
> > > supervisor: 2014-05-23 20:17:30 INFO supervisor:0 - Shutting down and 
> > > clearing state for id 94349373-74ec-484b-a9f8-a5076e17d474. Current 
> > > supervisor time: 1400876250. State: :disallowed, Heartbeat: 
> > > #backtype.storm.daemon.common.WorkerHeartbeat{{:time-secs 1400876249, 
> > > :storm-id "test-46-1400863199", :executors #{[-1 -1]}, :port 6700}
> > >
> > > Eventually Storm decides to just kill the worker and restart it as you 
> > > see in the supervisor log. We theorize this is the Zookeeper heartbeat 
> > > thread and it is being choked out due to very high CPU load on the 
> > > machine (near 100%).
> > >
> > > I have increased the connection timeouts in the storm.yaml config file 
> > > yet Storm seems to continue to use some unknown value for the above 
> > > client session timeout messages:
> > > storm.zookeeper.connection.timeout: 300000
> > > storm.zookeeper.session.timeout: 300000
> > >
> > > 1) What timeout config is appropriate for the above timeout message?
> > > 2) Is this expected behavior for Storm to be unable to keep up with 
> > > heartbeat threads under high CPU or is our theory incorrect?
> > >
> > > Thanks,
> > > Michael
> > > 
> > >

Reply via email to