Hi, I'd like to know if there is a way to do the following in Storm:
The topology: Spout1 -> Bolt1 -> Bolt2 *Spout1*: emits *about* 1 tuple per second. *Bolt1*: execute() method takes, *on average*, 5 seconds to process each tuple. *Bolt2*: must receive tuples in the same order that they were emitted from Spout1. As I understand it, without parallelization, Bolt1's input queue should grow by 4 tuples every 5 seconds. This, of course, would overflow eventually. However, if I set the parralelism_hint argument of Bolt1 equal to 5, then it should be fine. Here's the problem: I cannot guarantee that the processing time in Bolt1 will always be 5 seconds. So it could be that a tuple received by Bolt1 later in time is emitted before tuples that were received earlier than it. In other words, using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples emitted from Spout1 as [t1, t2, t3]. Is there a way to make sure that 1) Bolt2 receives the tuples in order, as well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in of Spout1? Thanks! Best, Bryan
