Derek,
We are currently running with -Xmx60G and only about 20-30G of that has been
observed to be used. I'm still observing workers restarted every 2 minutes.
What timeout is relevant to increase for the heartbeats in question? Is it be a
config on the Zookeeper side we can increase to make our topology more
resilient to these restarts?
Michael
> Date: Fri, 23 May 2014 15:50:50 -0500
> From: der...@yahoo-inc.com
> To: user@storm.incubator.apache.org
> Subject: Re: Workers constantly restarted due to session timeout
>
> > 2) Is this expected behavior for Storm to be unable to keep up with
> > heartbeat threads under high CPU or is our theory incorrect?
>
> Check your JVM max heap size (-Xmx). If you use too much, the JVM will
> garbage-collect, and that will stop everything--including the thread whose
> job it is to do the heartbeating.
>
>
>
> --
> Derek
>
> On 5/23/14, 15:38, Michael Dev wrote:
> > Hi all,
> >
> > We are seeing our workers constantly being killed by Storm with to the
> > following logs:
> > worker: 2014-05-23 20:15:08 INFO ClientCxn:1157 - Client session timed out,
> > have not heard from the server in 28105ms for sessionid 0x14619bf2f4e0109,
> > closing socket and attempting reconnect
> > supervisor: 2014-05-23 20:17:30 INFO supervisor:0 - Shutting down and
> > clearing state for id 94349373-74ec-484b-a9f8-a5076e17d474. Current
> > supervisor time: 1400876250. State: :disallowed, Heartbeat:
> > #backtype.storm.daemon.common.WorkerHeartbeat{{:time-secs 1400876249,
> > :storm-id "test-46-1400863199", :executors #{[-1 -1]}, :port 6700}
> >
> > Eventually Storm decides to just kill the worker and restart it as you see
> > in the supervisor log. We theorize this is the Zookeeper heartbeat thread
> > and it is being choked out due to very high CPU load on the machine (near
> > 100%).
> >
> > I have increased the connection timeouts in the storm.yaml config file yet
> > Storm seems to continue to use some unknown value for the above client
> > session timeout messages:
> > storm.zookeeper.connection.timeout: 300000
> > storm.zookeeper.session.timeout: 300000
> >
> > 1) What timeout config is appropriate for the above timeout message?
> > 2) Is this expected behavior for Storm to be unable to keep up with
> > heartbeat threads under high CPU or is our theory incorrect?
> >
> > Thanks,
> > Michael
> >
> >