As I said, you should figure out *why* your processes are locking up IMHO. Changing the timeouts simply covers up the behavior.
On Wed, Jul 27, 2016 at 12:00 AM, Navin Ipe <[email protected] > wrote: > @Satish: I'm not using table locks, but concurrent inserts to the same > table get queued up, right? > http://stackoverflow.com/questions/32087233/concurrent-insert-with-mysql/32288484#32288484 > > The original question is still open. Are the defaults best left alone? > > On Wed, Jul 27, 2016 at 12:15 PM, Satish Duggana <[email protected] > > wrote: > >> Why would each bolt need to wait till others to complete? Are you using >> table locks while inserting the data to mysql, is it really intended? >> >> Thanks, >> Satish. >> >> On Wed, Jul 27, 2016 at 11:54 AM, Erik Weathers <[email protected]> >> wrote: >> >>> The heartbeating is done in separate threads from the work execution >>> threads, so the reason for your need to increase the timeouts isn't as >>> straight forward as it may appear. In order for the >>> nimbus.task.timeout.secs & supervisor.worker.timeout.secs timeouts to be >>> exceeded, the worker process (normally [1]) needs to have completely locked >>> up. The most common reason for that would be garbage collection, and with >>> these values you are talking about I wouldn't be surprised to learn you are >>> incurring large GC pauses. I would look into that rather than just bumping >>> up the timeouts. >>> >>> - Erik >>> >>> [1] since the heartbeating is done via disk writes (worker to >>> supervisor) and ZooKeeper (worker to nimbus), it is *possible* that enough >>> delay is being introduced in those channels to cause the 30 second timeouts >>> to expire, but it seems pretty unlikely. >>> >>> On Tue, Jul 26, 2016 at 11:12 PM, Navin Ipe < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> Recently, I needed to increase the zookeeper timeout to: >>>> >>>> - tickTime=20000 >>>> - initLimit=10 >>>> - syncLimit=15 >>>> >>>> >>>> and storm.yaml defaults to: >>>> >>>> - supervisor.worker.timeout.secs: 600 >>>> - nimbus.task.timeout.secs: 600 >>>> - nimbus.supervisor.timeout.secs: 600 >>>> >>>> >>>> Did this because each of my bolts had to write >>>> <http://programmers.stackexchange.com/questions/325681/concurrent-inserts-to-mysql-or-write-to-separate-tables-and-consolidate-it> >>>> at least 100000 rows in batches of 1000 to the same table in MySQL, and >>>> each bolt took time to ack because it couldn't insert to SQL until another >>>> bolt had finished inserting. This caused Zookeeper to not receive a >>>> heartbeat and timeout. >>>> >>>> My supervisor advised that the better way to tackle this problem would >>>> be to: >>>> >>>> - Leave the zookeeper and storm defaults as it is because the >>>> creators of Storm had designed the defaults for a reason. >>>> - Also because everytime we upgrade Storm to a newer version, we'd >>>> have to remember to change those parameters. >>>> - Perhaps the design of the topology could be changed to have the >>>> bolts write fewer rows and ack more quickly so that there is no timeout. >>>> >>>> *My questions:* >>>> Are the storm and zookeeper defaults better left alone for the above >>>> reasons? >>>> >>>> >>>> -- >>>> Regards, >>>> Navin >>>> >>> >>> >> > > > -- > Regards, > Navin >
