To clarify, I meant if you implement a sorting bolt in core storm. I haven't used trident; would having only one tuple per batch incur a high overhead? On Nov 14, 2014 7:57 AM, "Nathan Leung" <[email protected]> wrote:
> If you do this be careful with timeouts. It will be very easy to have > cascading failures if you don't handle timed out tuples in a good manner. > On Nov 14, 2014 4:35 AM, "Susheel Kumar Gadalay" <[email protected]> > wrote: > >> Why don't u try Trident with batch size of 1 tuple and last bolt as >> partition persist. >> >> Trident ensures batches are updated in the order of emit even if >> processed parallel. >> >> On 11/11/14, Bryan Hernandez <[email protected]> wrote: >> > Hi, >> > >> > I'd like to know if there is a way to do the following in Storm: >> > >> > The topology: >> > >> > Spout1 -> Bolt1 -> Bolt2 >> > >> > *Spout1*: emits *about* 1 tuple per second. >> > *Bolt1*: execute() method takes, *on average*, 5 seconds to process each >> > tuple. >> > *Bolt2*: must receive tuples in the same order that they were emitted >> from >> > Spout1. >> > >> > As I understand it, without parallelization, Bolt1's input queue should >> > grow by 4 tuples every 5 seconds. This, of course, would overflow >> > eventually. However, if I set the parralelism_hint argument of Bolt1 >> equal >> > to 5, then it should be fine. >> > >> > Here's the problem: >> > >> > I cannot guarantee that the processing time in Bolt1 will always be 5 >> > seconds. So it could be that a tuple received by Bolt1 later in time is >> > emitted before tuples that were received earlier than it. In other >> words, >> > using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples >> > emitted from Spout1 as [t1, t2, t3]. >> > >> > Is there a way to make sure that 1) Bolt2 receives the tuples in order, >> as >> > well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate >> in >> > of Spout1? >> > >> > Thanks! >> > >> > Best, >> > Bryan >> > >> >
