If you do this be careful with timeouts. It will be very easy to have cascading failures if you don't handle timed out tuples in a good manner. On Nov 14, 2014 4:35 AM, "Susheel Kumar Gadalay" <[email protected]> wrote:
> Why don't u try Trident with batch size of 1 tuple and last bolt as > partition persist. > > Trident ensures batches are updated in the order of emit even if > processed parallel. > > On 11/11/14, Bryan Hernandez <[email protected]> wrote: > > Hi, > > > > I'd like to know if there is a way to do the following in Storm: > > > > The topology: > > > > Spout1 -> Bolt1 -> Bolt2 > > > > *Spout1*: emits *about* 1 tuple per second. > > *Bolt1*: execute() method takes, *on average*, 5 seconds to process each > > tuple. > > *Bolt2*: must receive tuples in the same order that they were emitted > from > > Spout1. > > > > As I understand it, without parallelization, Bolt1's input queue should > > grow by 4 tuples every 5 seconds. This, of course, would overflow > > eventually. However, if I set the parralelism_hint argument of Bolt1 > equal > > to 5, then it should be fine. > > > > Here's the problem: > > > > I cannot guarantee that the processing time in Bolt1 will always be 5 > > seconds. So it could be that a tuple received by Bolt1 later in time is > > emitted before tuples that were received earlier than it. In other > words, > > using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples > > emitted from Spout1 as [t1, t2, t3]. > > > > Is there a way to make sure that 1) Bolt2 receives the tuples in order, > as > > well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in > > of Spout1? > > > > Thanks! > > > > Best, > > Bryan > > >
