The heartbeating is done in separate threads from the work execution threads, so the reason for your need to increase the timeouts isn't as straight forward as it may appear. In order for the nimbus.task.timeout.secs & supervisor.worker.timeout.secs timeouts to be exceeded, the worker process (normally [1]) needs to have completely locked up. The most common reason for that would be garbage collection, and with these values you are talking about I wouldn't be surprised to learn you are incurring large GC pauses. I would look into that rather than just bumping up the timeouts.
- Erik [1] since the heartbeating is done via disk writes (worker to supervisor) and ZooKeeper (worker to nimbus), it is *possible* that enough delay is being introduced in those channels to cause the 30 second timeouts to expire, but it seems pretty unlikely. On Tue, Jul 26, 2016 at 11:12 PM, Navin Ipe <[email protected] > wrote: > Hi, > > Recently, I needed to increase the zookeeper timeout to: > > - tickTime=20000 > - initLimit=10 > - syncLimit=15 > > > and storm.yaml defaults to: > > - supervisor.worker.timeout.secs: 600 > - nimbus.task.timeout.secs: 600 > - nimbus.supervisor.timeout.secs: 600 > > > Did this because each of my bolts had to write > <http://programmers.stackexchange.com/questions/325681/concurrent-inserts-to-mysql-or-write-to-separate-tables-and-consolidate-it> > at least 100000 rows in batches of 1000 to the same table in MySQL, and > each bolt took time to ack because it couldn't insert to SQL until another > bolt had finished inserting. This caused Zookeeper to not receive a > heartbeat and timeout. > > My supervisor advised that the better way to tackle this problem would be > to: > > - Leave the zookeeper and storm defaults as it is because the creators > of Storm had designed the defaults for a reason. > - Also because everytime we upgrade Storm to a newer version, we'd > have to remember to change those parameters. > - Perhaps the design of the topology could be changed to have the > bolts write fewer rows and ack more quickly so that there is no timeout. > > *My questions:* > Are the storm and zookeeper defaults better left alone for the above > reasons? > > > -- > Regards, > Navin >
