Topology Restart due to Executor Not Alive

Josh Walton Wed, 12 Mar 2014 10:33:11 -0700

Overnight last night, it appears my Storm Trident topology restarted
itself. When I checked the Storm UI, it said the topology had been running
for 24 hours, and showed no error or exceptions in any of the bolts.


I check the nimbus log and see the following:

2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[34
34] not alive
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[4 4]
not alive
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[40
40] not alive
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[10
10] not alive
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[16
16] not alive
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[22
22] not alive
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[28
28] not alive
2014-03-12 10:55:06 b.s.s.EvenScheduler [INFO] Available slots:
(["5d105f66-1add-421b-8265-e7340a95928c" 6700]
["32ab1745-c260-4491-ae4d-92dcc5d14a62" 6700])
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassigning MITAS3-74-1394565794 to
6 slots
2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassign executors: [[34 34] [4 4]
[40 40] [10 10] [16 16] [22 22] [28 28]]

It appears that an executor was alive, and must have timed out somehow
since I didn't see any exceptions or stack traces in the logs.

Is there a way to change the timeout? I see several timeout settings, but
I'm not sure if any of those would help prevent this type of restart. I am
using a custom TridentState which holds data in memory so we lost data as a
result of this restart, and would like to prevent this from happening again.

Thanks

Josh

Topology Restart due to Executor Not Alive

Reply via email to