Re: Ensure tuples are processed in order while still avoiding bottlenecks

Nathan Leung Fri, 14 Nov 2014 04:59:58 -0800

If you do this be careful with timeouts. It will be very easy to have
cascading failures if you don't handle timed out tuples in a good manner.
On Nov 14, 2014 4:35 AM, "Susheel Kumar Gadalay" <[email protected]>
wrote:


> Why don't u try Trident with batch size of 1 tuple and last bolt as
> partition persist.
>
> Trident ensures batches are updated in the order of emit even if
> processed parallel.
>
> On 11/11/14, Bryan Hernandez <[email protected]> wrote:
> > Hi,
> >
> > I'd like to know if there is a way to do the following in Storm:
> >
> > The topology:
> >
> > Spout1 -> Bolt1 -> Bolt2
> >
> > *Spout1*: emits *about* 1 tuple per second.
> > *Bolt1*: execute() method takes, *on average*, 5 seconds to process each
> > tuple.
> > *Bolt2*: must receive tuples in the same order that they were emitted
> from
> > Spout1.
> >
> > As I understand it, without parallelization, Bolt1's input queue should
> > grow by 4 tuples every 5 seconds.  This, of course, would overflow
> > eventually.  However, if I set the parralelism_hint argument of Bolt1
> equal
> > to 5, then it should be fine.
> >
> > Here's the problem:
> >
> > I cannot guarantee that the processing time in Bolt1 will always be 5
> > seconds.  So it could be that a tuple received by Bolt1 later in time is
> > emitted before tuples that were received earlier than it.  In other
> words,
> > using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples
> > emitted from Spout1 as [t1, t2, t3].
> >
> > Is there a way to make sure that 1) Bolt2 receives the tuples in order,
> as
> > well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in
> > of Spout1?
> >
> > Thanks!
> >
> > Best,
> > Bryan
> >
>

Re: Ensure tuples are processed in order while still avoiding bottlenecks

Reply via email to