Hi Abhishek, Did you check whether spout is really emitting messages? On Thu, Aug 4, 2016 at 5:42 PM, Abhishek Raj <[email protected]> wrote:
> Thanks for the quick response. According to storm documentation, if a > worker/node dies it's automatically restarted. Also, the bolts still show > up in storm ui. They just don't seem to be processing any data. The link > you mentioned could have been of great help but we're stuck on an old > version right now which doesn't have those features and upgrading is not an > option. > What could be other possible reasons for a bolt to completely hang while > the rest of topology works fine? > > On Aug 4, 2016 4:44 PM, "Navin Ipe" <[email protected]> > wrote: > >> The last time I encountered crashes that left no error messages, was when >> the OS killed a process that took up too much processing power. This gets >> worse on Ubuntu systems, where there is no log registered about the OOM >> killer even in the system logs. >> For debugging Storm, there are these options: >> https://community.hortonworks.com/articles/36151/debugging- >> an-apache-storm-topology.html >> >> On a side note, having 8 bolts seems like a rather complicated situation. >> This is if it is Spout ---> Bolt1 ---> Bolt2 ---> Bolt3 ---->and so on ---> >> Bolt8. Takes too long for an ack. Design change recommended. >> >> On Thu, Aug 4, 2016 at 2:35 PM, Abhishek Raj <[email protected]> >> wrote: >> >>> Hi. >>> >>> We are using storm 0.9.4. Our topology consists of a linear chain of 1 >>> spout and 8 bolts. In the 4th bolt we call an external bolt written in php >>> which emits to 5th bolt after some processing. >>> We are seeing that after some time, the 6th, 7th and 8th bolt completely >>> stop processing. The executed, acked, emitted and transferred numbers drop >>> to zero for these bolts and there is no error messages in the worker logs. >>> Other bolts still seem to be processing data and emitting but the last 3 >>> bolts completely halt and do no processing. The failed count keeps >>> increasing on the kafka spout, but the failed count of the individual bolts >>> still remains 0. >>> We already tried increasing tuple timeout threshold and decreasing >>> max-spout-pending to no avail. Eventually, the bolts completely stopped >>> processing. We are not really sure if it has something to do with the >>> external php bolt that we call because it still seems to be processing data >>> fine and sends heartbeat. >>> >>> Any pointers about how to go about debugging this would be great. >>> >>> -- >>> Abhishek >>> >> >> >> >> -- >> Regards, >> Navin >> >
