(I am a storm newbie myself.)
AFAIK you would need to add a timestamp in the spout, and then add a sort bolt between bolt1 and bol2. The bolt logic would be something of this sort: 1. On execute place the new tuple in a guava treemultiset (newest first) 2. On a separate thread if the last item in the set is older than 10 seconds and it is not out of order then emit it to the next bolt and ack the tuple. Otherwise, if it is out-of-order then either emit it anyway or report an error and fail the tuple. Regards, Itai ________________________________ From: Bryan Hernandez <[email protected]> Sent: Tuesday, November 11, 2014 6:08 PM To: [email protected] Subject: Re: Ensure tuples are processed in order while still avoiding bottlenecks I should clarify that I am aware I could make Bolt2 also subscribe to Spout1, so that it knows the correct order. However, I am wondering if there is a built-in Storm way of handling this requirement in general. Thanks! Best, Bryan On Tue, Nov 11, 2014 at 5:03 PM, Bryan Hernandez <[email protected]<mailto:[email protected]>> wrote: Hi, I'd like to know if there is a way to do the following in Storm: The topology: Spout1 -> Bolt1 -> Bolt2 Spout1: emits about 1 tuple per second. Bolt1: execute() method takes, on average, 5 seconds to process each tuple. Bolt2: must receive tuples in the same order that they were emitted from Spout1. As I understand it, without parallelization, Bolt1's input queue should grow by 4 tuples every 5 seconds. This, of course, would overflow eventually. However, if I set the parralelism_hint argument of Bolt1 equal to 5, then it should be fine. Here's the problem: I cannot guarantee that the processing time in Bolt1 will always be 5 seconds. So it could be that a tuple received by Bolt1 later in time is emitted before tuples that were received earlier than it. In other words, using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples emitted from Spout1 as [t1, t2, t3]. Is there a way to make sure that 1) Bolt2 receives the tuples in order, as well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in of Spout1? Thanks! Best, Bryan
