This isn’t a spout issue, this is a _topology_ issue. Specifically, what I 
believe is happening is tuples that are delayed in a topology, maybe in a queue 
or bolt somewhere, are being overtaken by batch start/end tuples, breaking any 
ordering constraints within a batch.

So if this is emitted (right to left), and it is a condition of the topology 
that A is always sent first:
xxxxxxxxxxA

If the batch times out, and a batch retry R is emitted, if some of the x tuples 
are delayed, the bolt receives this (again, RTL):
yyyyyyAxxxxxxxRxxxxA

Which breaks the condition that A is always the first tuple in a batch

From: Mayur Rustagi [mailto:[email protected]]
Sent: 25 September 2014 11:59
To: [email protected]
Subject: Re: What happens on a batch timeout?

Seems to me, it depends on which spout you are using. If you are using Kafka & 
Transactional Spout then replay is consistent each time. In any other queue, 
batch may be different.
This contains the type of spouts & their limitations.
http://storm.incubator.apache.org/documentation/Trident-spouts.html


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi<https://twitter.com/mayur_rustagi>


On Thu, Sep 25, 2014 at 3:10 PM, Simon Cooper 
<[email protected]<mailto:[email protected]>> wrote:
Does anyone have any information that could help with this? I’m baffled and 
don’t understand the behaviour we’re seeing – events are being received out of 
order on a batch replay, the only reason I can think is that tuples are left 
over from the previous batch in the input queues, but trying to use the batch 
id to filter tuples doesn’t seem to work.

Unfortunately, I can’t understand the behaviour without some input from someone 
who knows how trident works and can match this behaviour onto what trident is 
*meant* to do on a batch replay.

SimonC

From: Simon Cooper 
[mailto:[email protected]<mailto:[email protected]>]
Sent: 19 August 2014 16:10
To: [email protected]<mailto:[email protected]>
Subject: RE: What happens on a batch timeout?

BTW, I’m referring to trident batches.

From: Simon Cooper [mailto:[email protected]]
Sent: 19 August 2014 15:49
To: [email protected]<mailto:[email protected]>
Subject: What happens on a batch timeout?

When a batch times out, what happens to all the current in-flight tuples when 
the batch is replayed? Are they removed from the executor queues, or are they 
left in the queues, so they might be received by the executor as part of the 
replayed batch/next batch, if the executor is running behind?

SimonC

Reply via email to