Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

Ethan Li Thu, 23 Jan 2020 11:55:27 -0800

> 1) What is stored in Workerbeats znode?

Worker periodically sends heartbeat to zookeeper under workerbeats node.

> 2) Which settings control the frequency of workerbeats update

https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539

<https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539>
task.heartbeat.frequency.secs  Default to 3

> 3)What will be the impact if the frequency is reduced

Nimbus get the worker status from workerbeat znode to know if executors on 
workers are alive or not. 
https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601

<https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601>
If heartbeat exceeds nimbus.task.timeout.secs (default to 30), nimbus will 
think the certain executor is dead and try to reschedule.

To reduce the issue on zookeeper, a pacemaker component was introduced. 
https://github.com/apache/storm/blob/master/docs/Pacemaker.md 
<https://github.com/apache/storm/blob/master/docs/Pacemaker.md> 
You might want to use it too. 

Thanks

> On Dec 10, 2019, at 4:36 PM, Surajeet Dev <[email protected]> wrote:
> 
> We upgraded Storm version to 1.2.1 , and since then have been consistently 
> observing Zookeeper session timeouts . 
> 
> On analysis , we observed that there is high frequency of updates on 
> workerbeats znode with data upto size of 50KB. This causes the Garbage 
> Collector to kickoff lasting more than 15 secs , resulting in Zookeper 
> session timeout
> 
> I understand , increasing the session timeout will alleviate the issue , but 
> we have already done that twice 
> 
> My questions are:
> 
> 1) What is stored in Workerbeats znode?
> 2) Which settings control the frequency of workerbeats update
> 3)What will be the impact if the frequency is reduced
> 
>

Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

Reply via email to