I have been seeing this behaviour on 0.9.0.1 running on (aws & non-vpc). All tuples get a fail() on the spout and I'm not sure why. Even a simple case of spoutA -> boltB is showing up this behaviour after a continuous flow of tuples.
So far increasing ACKer count hasn't helped. All I could figure out was the fail() is called from backtype.storm.utils.RotatingMap#rotate which I believe means that the topology.max.spout.pending time has exceeded and the tuple is not yet marked as completed. I'm pretty sure there are no exceptions in handling the tuples. Will update if I find any insights. On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <[email protected]> wrote: > Hi Michael Chang, > > > > Did you ack or fail tuple in the bolt timely and please check the > bolt processing speed of a tuple. > > > > > > > > *发件人:* Michael Chang [mailto:[email protected]] > *发送时间:* 2014年4月15日 16:41 > *收件人:* [email protected] > *主题:* Storm Topology Halts > > > > [email protected] all, > > > > Issue: > > > > We are having issues with stuck topologies. When submitted and started, > our topology will start processing for a while, then completely halt for > around topology.max.spout.pending seconds, after which it seems that all > the in-flight tuples are failed. This cycle will loop continuously. Has > anybody seen this issue / have suggestions about how to debug? > > > > Environment: > > > > We are running a storm cluster in AWS, non-vpc. We’re running 0.9.1 but > using guava 16.0.1 and httpclient 4.3.1 in the lib path. We were > originally trying this with the regular netty transport, and reverting back > to the zmq transport seemed to help at first, but now we’re seeing the same > behavior as well, so it seems like a deeper rooted problem than just the > transport. > > > > Any help would be appreciated. > > > > Thanks, > > > > Michael >
