> 1) What is stored in Workerbeats znode? Worker periodically sends heartbeat to zookeeper under workerbeats node.
> 2) Which settings control the frequency of workerbeats update https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539 <https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539> task.heartbeat.frequency.secs Default to 3 > 3)What will be the impact if the frequency is reduced Nimbus get the worker status from workerbeat znode to know if executors on workers are alive or not. https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601 <https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601> If heartbeat exceeds nimbus.task.timeout.secs (default to 30), nimbus will think the certain executor is dead and try to reschedule. To reduce the issue on zookeeper, a pacemaker component was introduced. https://github.com/apache/storm/blob/master/docs/Pacemaker.md <https://github.com/apache/storm/blob/master/docs/Pacemaker.md> You might want to use it too. Thanks > On Dec 10, 2019, at 4:36 PM, Surajeet Dev <[email protected]> wrote: > > We upgraded Storm version to 1.2.1 , and since then have been consistently > observing Zookeeper session timeouts . > > On analysis , we observed that there is high frequency of updates on > workerbeats znode with data upto size of 50KB. This causes the Garbage > Collector to kickoff lasting more than 15 secs , resulting in Zookeper > session timeout > > I understand , increasing the session timeout will alleviate the issue , but > we have already done that twice > > My questions are: > > 1) What is stored in Workerbeats znode? > 2) Which settings control the frequency of workerbeats update > 3)What will be the impact if the frequency is reduced > >
