RE: Re: How will storm replay the tuple tree?

Hart, James W. Wed, 14 Sep 2016 07:25:55 -0700

In my testing when a tuple was replayed by a spout, every kafka message from 
the replayed one to the end was replayed.  That’s why all bolts need to be 
idempotent so that replays do not cause work to be done twice.  I think it has 
to do with kafka tracking the offset of the last acked message in a topic, not 
the actual ack of every message individually.  This is a simplistic view as 
it’s a lot more complicated than this.

If anybody can confirm this, please respond as it was a surprise to me and 
cause me a couple of days of testing when I encountered it.

From: [email protected] [mailto:[email protected]]
Sent: Tuesday, September 13, 2016 9:22 PM
To: user
Subject: Re: Re: How will storm replay the tuple tree?

Yes, only the failed tuple are replayed, but the whole batch will be held.

So, if the tuple failed forever, the batch will be held forever?

I am just not clear  the tuple itself or the batch which owns the tuple will be 
held in spout.

________________________________
Josh

From: Ambud Sharma<mailto:[email protected]>
Date: 2016-09-14 09:10
To: user<mailto:[email protected]>
Subject: Re: Re: How will storm replay the tuple tree?

No as per the code only individual messages are replayed.

On Sep 13, 2016 6:09 PM, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>> wrote:
Hi：

I'd like to make clear on something about Kafka-spout referring to ack.

For example, kafka-spout fetches offset 5000-6000 from Kafka server, but one 
tuple whose offset is 5101 is failed by a bolt, the whole batch of 5000-6000 
will be remain in kafka-spout until the 5101 tuple will be acked. If the 5101 
tuple can not be acked for a long time, the batch 5000-6000 will remain for a 
long time, and the kafka-spout will stop to fetch data from kafka in these time.

Am I right?

________________________________
Josh

From: Tech Id<mailto:[email protected]>
Date: 2016-09-14 06:26
To: user<mailto:[email protected]>
Subject: Re: How will storm replay the tuple tree?
I agree with this statement about code/architecture but in case of some system 
outages, like one of the end-points (Solr, Couchbase, Elastic-Search etc.) 
being down temporarily, a very large number of other fully-functional and 
healthy systems will receive a large number of duplicate replays (especially in 
heavy throughput topologies).

If you can elaborate a little more on the performance cost of tracking tuples 
or point to a document reflecting the same, that will be of great help.

Best,
T.I.

On Tue, Sep 13, 2016 at 12:26 PM, Hart, James W. 
<[email protected]<mailto:[email protected]>> wrote:
Failures should be very infrequent, if they are not then rethink the code and 
architecture.  The performance cost of tracking tuples in the way that would be 
required to replay at the failure is large, basically that method would slow 
everything way down for very infrequent failures.

From: S G [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, September 13, 2016 3:17 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: How will storm replay the tuple tree?

Hi,

I am a little curious to know why we begin at the spout level for case 1.
If we replay at the failing bolt's parent level (BoltA in this case), then it 
should be more performant due to a decrease in duplicate processing (as 
compared to whole tuple tree replays).

If BoltA crashes due to some reason while replaying, only then the Spout should 
receive this as a failure and whole tuple tree should be replayed.

This saving in duplicate processing will be more visible with several layers of 
bolts.

I am sure there is a good reason to replay the whole tuple-tree, and want to 
know the same.

Thanks
SG

On Tue, Sep 13, 2016 at 10:22 AM, P. Taylor Goetz 
<[email protected]<mailto:[email protected]>> wrote:
Hi Cheney,

Replays happen at the spout level. So if there is a failure at any point in the 
tuple tree (the tuple tree being the anchored emits, unanchored emits don’t 
count), the original spout tuple will be replayed. So the replayed tuple will 
traverse the topology again, including unanchored points.

If an unanchored tuple fails downstream, it will not trigger a replay.

Hope this helps.

-Taylor

On Sep 13, 2016, at 4:42 AM, Cheney Chen 
<[email protected]<mailto:[email protected]>> wrote:

Hi there,

We're using storm 1.0.1, and I'm checking through 
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html

Got questions for below two scenarios.
Assume topology: S (spout) --> BoltA --> BoltB
1. S: anchored emit, BoltA: anchored emit
Suppose BoltB processing failed w/ ack, what will the replay be, will it 
execute both BoltA and BoltB or only failed BoltB processing?

2. S: anchored emit, BoltA: unanchored emit
Suppose BoltB processing failed w/ ack, replay will not happen, correct?

--
Regards,
Qili Chen (Cheney)

E-mail: [email protected]<mailto:[email protected]>
MP: (+1) 4086217503<tel:%28%2B1%29%204086217503>

RE: Re: How will storm replay the tuple tree?

Reply via email to