The heartbeating is done in separate threads from the work execution
threads, so the reason for your need to increase the timeouts isn't as
straight forward as it may appear.  In order for the
nimbus.task.timeout.secs & supervisor.worker.timeout.secs timeouts to be
exceeded, the worker process (normally [1]) needs to have completely locked
up.  The most common reason for that would be garbage collection, and with
these values you are talking about I wouldn't be surprised to learn you are
incurring large GC pauses.  I would look into that rather than just bumping
up the timeouts.

- Erik

[1] since the heartbeating is done via disk writes (worker to supervisor)
and ZooKeeper (worker to nimbus), it is *possible* that enough delay is
being introduced in those channels to cause the 30 second timeouts to
expire, but it seems pretty unlikely.

On Tue, Jul 26, 2016 at 11:12 PM, Navin Ipe <[email protected]
> wrote:

> Hi,
>
> Recently, I needed to increase the zookeeper timeout to:
>
>    - tickTime=20000
>    - initLimit=10
>    - syncLimit=15
>
>
> and storm.yaml defaults to:
>
>    - supervisor.worker.timeout.secs: 600
>    - nimbus.task.timeout.secs: 600
>    - nimbus.supervisor.timeout.secs: 600
>
>
> Did this because each of my bolts had to write
> <http://programmers.stackexchange.com/questions/325681/concurrent-inserts-to-mysql-or-write-to-separate-tables-and-consolidate-it>
> at least 100000 rows in batches of 1000 to the same table in MySQL, and
> each bolt took time to ack because it couldn't insert to SQL until another
> bolt had finished inserting. This caused Zookeeper to not receive a
> heartbeat and timeout.
>
> My supervisor advised that the better way to tackle this problem would be
> to:
>
>    - Leave the zookeeper and storm defaults as it is because the creators
>    of Storm had designed the defaults for a reason.
>    - Also because everytime we upgrade Storm to a newer version, we'd
>    have to remember to change those parameters.
>    - Perhaps the design of the topology could be changed to have the
>    bolts write fewer rows and ack more quickly so that there is no timeout.
>
> *My questions:*
> Are the storm and zookeeper defaults better left alone for the above
> reasons?
>
>
> --
> Regards,
> Navin
>

Reply via email to