Ganesh,
Nimbus is a sort of JobTracker. It makes sense that the job resumes only after
Nimbus started working correctly. Otherwise, the state of the running job would
have been lost.
Thanks
On Friday, January 8, 2016 1:07 PM, Ganesh Chandrasekaran
<[email protected]> wrote:
<!--#yiv3273439583 _filtered #yiv3273439583 {font-family:Calibri;panose-1:2
15 5 2 2 2 4 3 2 4;} _filtered #yiv3273439583 {font-family:Tahoma;panose-1:2 11
6 4 3 5 4 4 2 4;}#yiv3273439583 #yiv3273439583 p.yiv3273439583MsoNormal,
#yiv3273439583 li.yiv3273439583MsoNormal, #yiv3273439583
div.yiv3273439583MsoNormal
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri",
"sans-serif";}#yiv3273439583 a:link, #yiv3273439583
span.yiv3273439583MsoHyperlink
{color:blue;text-decoration:underline;}#yiv3273439583 a:visited, #yiv3273439583
span.yiv3273439583MsoHyperlinkFollowed
{color:purple;text-decoration:underline;}#yiv3273439583
p.yiv3273439583MsoAcetate, #yiv3273439583 li.yiv3273439583MsoAcetate,
#yiv3273439583 div.yiv3273439583MsoAcetate
{margin:0in;margin-bottom:.0001pt;font-size:8.0pt;font-family:"Tahoma",
"sans-serif";}#yiv3273439583 span.yiv3273439583EmailStyle17
{font-family:"Calibri", "sans-serif";color:windowtext;}#yiv3273439583
span.yiv3273439583BalloonTextChar {font-family:"Tahoma",
"sans-serif";}#yiv3273439583 .yiv3273439583MsoChpDefault
{font-family:"Calibri", "sans-serif";} _filtered #yiv3273439583 {margin:1.0in
1.0in 1.0in 1.0in;}#yiv3273439583 div.yiv3273439583WordSection1 {}-->I wanted
to understand how Storm works when one of its worker crashes. SO here is the
situation I ran into recently. My topology is distributed across 2 workers with
a total of 6 threads. Somehow 3 threads died because one worker went down. At
the same time nimbus service was also down because of which it could not spin
up threads on other available workers. I noticed Storm wasn’t processing
messages for the topology till Nimbus was restored and it spun up the remaining
threads that were down. Is this the expected behavior? I was expecting Storm
to continue processing messages with 1 half of the threads still up on the
other worker. Thanks, Ganesh