Very nice discussion ! I have also been wanting to see a feature something similar to Ravi's comment above:
"*There is one thing i am looking forward from Storm is to inform Spout about what kind of failure it was*. i.e. if it was ConnectionTimeout or ReadTimeout etc, that means if i retry it may pass. But say it was null pointer exception(java world) , i know the data which is being expected is not there and my code is not handling that scenario, so either i will have to change code or ask data provider to send that field, but retry wont help me." I think we need 1) Add a new method *failWithoutRetry(Tuple, Exception)* in the collector. 2) Provide the ability to *configure a dead-letter data-store in the spout* for failed messages reported by #1 above. The configurable data-store should support kafka, solr and redis to begin-with (Plus the option to implement one's own and dropping a jar file in the classpath). Such a feature would benefit all the spouts. Benefits: 1) Topologies will not block replaying the same doomed-to-fail tuples. 2) Users can set alerts on dead-letters and find out easily actual problems in their topologies rather than analyze all failed tuples only to find that they failed because of a temporary network glitch. 3) Since the entire Tuple is put into the dead-letter, all the data is available for retrying after fixing the topology code. Thx, SG On Wed, Sep 14, 2016 at 7:25 AM, Hart, James W. <[email protected]> wrote: > In my testing when a tuple was replayed by a spout, every kafka message > from the replayed one to the end was replayed. That’s why all bolts need > to be idempotent so that replays do not cause work to be done twice. I > think it has to do with kafka tracking the offset of the last acked message > in a topic, not the actual ack of every message individually. This is a > simplistic view as it’s a lot more complicated than this. > > > > If anybody can confirm this, please respond as it was a surprise to me and > cause me a couple of days of testing when I encountered it. > > > > > > *From:* [email protected] [mailto:[email protected]] > *Sent:* Tuesday, September 13, 2016 9:22 PM > > *To:* user > *Subject:* Re: Re: How will storm replay the tuple tree? > > > > Yes, only the failed tuple are replayed, but the whole batch will be held. > > > > So, if the tuple failed forever, the batch will be held forever? > > > > I am just not clear the tuple itself or the batch which owns the tuple > will be held in spout. > > > > > ------------------------------ > > Josh > > > > > > > > *From:* Ambud Sharma <[email protected]> > > *Date:* 2016-09-14 09:10 > > *To:* user <[email protected]> > > *Subject:* Re: Re: How will storm replay the tuple tree? > > No as per the code only individual messages are replayed. > > > > On Sep 13, 2016 6:09 PM, "[email protected]" <[email protected]> > wrote: > > Hi: > > > > I'd like to make clear on something about Kafka-spout referring to ack. > > > > For example, kafka-spout fetches offset 5000-6000 from Kafka server, but > one tuple whose offset is 5101 is failed by a bolt, the whole batch of > 5000-6000 will be remain in kafka-spout until the 5101 tuple will be acked. > If the 5101 tuple can not be acked for a long time, the batch 5000-6000 > will remain for a long time, and the kafka-spout will stop to fetch data > from kafka in these time. > > > > Am I right? > > > > > ------------------------------ > > Josh > > > > > > > > *From:* Tech Id <[email protected]> > > *Date:* 2016-09-14 06:26 > > *To:* user <[email protected]> > > *Subject:* Re: How will storm replay the tuple tree? > > I agree with this statement about code/architecture but in case of some > system outages, like one of the end-points (Solr, Couchbase, Elastic-Search > etc.) being down temporarily, a very large number of other fully-functional > and healthy systems will receive a large number of duplicate replays > (especially in heavy throughput topologies). > > > > If you can elaborate a little more on the performance cost of tracking > tuples or point to a document reflecting the same, that will be of great > help. > > > > Best, > > T.I. > > > > On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. <[email protected]> wrote: > > Failures should be very infrequent, if they are not then rethink the code > and architecture. The performance cost of tracking tuples in the way that > would be required to replay at the failure is large, basically that method > would slow everything way down for very infrequent failures. > > > > *From:* S G [mailto:[email protected]] > *Sent:* Tuesday, September 13, 2016 3:17 PM > *To:* [email protected] > *Subject:* Re: How will storm replay the tuple tree? > > > > Hi, > > > > I am a little curious to know why we begin at the spout level for case 1. > > If we replay at the failing bolt's parent level (BoltA in this case), then > it should be more performant due to a decrease in duplicate processing (as > compared to whole tuple tree replays). > > > > If BoltA crashes due to some reason while replaying, only then the Spout > should receive this as a failure and whole tuple tree should be replayed. > > > > This saving in duplicate processing will be more visible with several > layers of bolts. > > > > I am sure there is a good reason to replay the whole tuple-tree, and want > to know the same. > > > > Thanks > > SG > > > > On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz <[email protected]> > wrote: > > Hi Cheney, > > > > Replays happen at the spout level. So if there is a failure at any point > in the tuple tree (the tuple tree being the anchored emits, unanchored > emits don’t count), the original spout tuple will be replayed. So the > replayed tuple will traverse the topology again, including unanchored > points. > > > > If an unanchored tuple fails downstream, it will not trigger a replay. > > > > Hope this helps. > > > > -Taylor > > > > > > On Sep 13, 2016, at 4:42 AM, Cheney Chen <[email protected]> wrote: > > > > Hi there, > > > > We're using storm 1.0.1, and I'm checking through http://storm.apache. > org/releases/1.0.1/Guaranteeing-message-processing.html > > > > Got questions for below two scenarios. > > Assume topology: S (spout) --> BoltA --> BoltB > > 1. S: anchored emit, BoltA: anchored emit > > Suppose BoltB processing failed w/ ack, what will the replay be, will it > execute both BoltA and BoltB or only failed BoltB processing? > > > > 2. S: anchored emit, BoltA: unanchored emit > > Suppose BoltB processing failed w/ ack, replay will not happen, correct? > > > > -- > > Regards, > Qili Chen (Cheney) > > E-mail: [email protected] > MP: (+1) 4086217503 > > > > > > > >
