Why don't u try Trident with batch size of 1 tuple and last bolt as partition persist.
Trident ensures batches are updated in the order of emit even if processed parallel. On 11/11/14, Bryan Hernandez <[email protected]> wrote: > Hi, > > I'd like to know if there is a way to do the following in Storm: > > The topology: > > Spout1 -> Bolt1 -> Bolt2 > > *Spout1*: emits *about* 1 tuple per second. > *Bolt1*: execute() method takes, *on average*, 5 seconds to process each > tuple. > *Bolt2*: must receive tuples in the same order that they were emitted from > Spout1. > > As I understand it, without parallelization, Bolt1's input queue should > grow by 4 tuples every 5 seconds. This, of course, would overflow > eventually. However, if I set the parralelism_hint argument of Bolt1 equal > to 5, then it should be fine. > > Here's the problem: > > I cannot guarantee that the processing time in Bolt1 will always be 5 > seconds. So it could be that a tuple received by Bolt1 later in time is > emitted before tuples that were received earlier than it. In other words, > using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples > emitted from Spout1 as [t1, t2, t3]. > > Is there a way to make sure that 1) Bolt2 receives the tuples in order, as > well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in > of Spout1? > > Thanks! > > Best, > Bryan >
