I have had these painful issues with multilang protocol in php. I would see php processes running on linux machine supervisors but they would stop processing with no errors. This is why nimbus would not re-assign either. I had spent several operational hours debugging with luck and t even rewrote all my bolts in Java and wrapped php logic in webservice and called that from Java. Except for a negligible latency things have been working great ever since. ________________________________ From: Abhishek Raj <[email protected]> Sent: Thursday, August 4, 2016 11:14:03 AM To: [email protected] Subject: Re: Some bolts stop processing after a while.
Hi Satish, yes the spout is emitting messages. The topology works fine in the start. Then the failed count on the spout gradually increases and last 3 bolts stop processing after a while but the other bolts still process and emit. On Thu, Aug 4, 2016 at 8:11 PM, Satish Duggana <[email protected]<mailto:[email protected]>> wrote: Hi Abhishek, Did you check whether spout is really emitting messages? On Thu, Aug 4, 2016 at 5:42 PM, Abhishek Raj <[email protected]<mailto:[email protected]>> wrote: Thanks for the quick response. According to storm documentation, if a worker/node dies it's automatically restarted. Also, the bolts still show up in storm ui. They just don't seem to be processing any data. The link you mentioned could have been of great help but we're stuck on an old version right now which doesn't have those features and upgrading is not an option. What could be other possible reasons for a bolt to completely hang while the rest of topology works fine? On Aug 4, 2016 4:44 PM, "Navin Ipe" <[email protected]<mailto:[email protected]>> wrote: The last time I encountered crashes that left no error messages, was when the OS killed a process that took up too much processing power. This gets worse on Ubuntu systems, where there is no log registered about the OOM killer even in the system logs. For debugging Storm, there are these options: https://community.hortonworks.com/articles/36151/debugging-an-apache-storm-topology.html On a side note, having 8 bolts seems like a rather complicated situation. This is if it is Spout ---> Bolt1 ---> Bolt2 ---> Bolt3 ---->and so on ---> Bolt8. Takes too long for an ack. Design change recommended. On Thu, Aug 4, 2016 at 2:35 PM, Abhishek Raj <[email protected]<mailto:[email protected]>> wrote: Hi. We are using storm 0.9.4. Our topology consists of a linear chain of 1 spout and 8 bolts. In the 4th bolt we call an external bolt written in php which emits to 5th bolt after some processing. We are seeing that after some time, the 6th, 7th and 8th bolt completely stop processing. The executed, acked, emitted and transferred numbers drop to zero for these bolts and there is no error messages in the worker logs. Other bolts still seem to be processing data and emitting but the last 3 bolts completely halt and do no processing. The failed count keeps increasing on the kafka spout, but the failed count of the individual bolts still remains 0. We already tried increasing tuple timeout threshold and decreasing max-spout-pending to no avail. Eventually, the bolts completely stopped processing. We are not really sure if it has something to do with the external php bolt that we call because it still seems to be processing data fine and sends heartbeat. Any pointers about how to go about debugging this would be great. -- Abhishek -- Regards, Navin -- Abhishek
