Re: Ensure tuples are processed in order while still avoiding bottlenecks

Susheel Kumar Gadalay Fri, 14 Nov 2014 01:36:07 -0800

Why don't u try Trident with batch size of 1 tuple and last bolt as
partition persist.


Trident ensures batches are updated in the order of emit even if
processed parallel.

On 11/11/14, Bryan Hernandez <[email protected]> wrote:
> Hi,
>
> I'd like to know if there is a way to do the following in Storm:
>
> The topology:
>
> Spout1 -> Bolt1 -> Bolt2
>
> *Spout1*: emits *about* 1 tuple per second.
> *Bolt1*: execute() method takes, *on average*, 5 seconds to process each
> tuple.
> *Bolt2*: must receive tuples in the same order that they were emitted from
> Spout1.
>
> As I understand it, without parallelization, Bolt1's input queue should
> grow by 4 tuples every 5 seconds.  This, of course, would overflow
> eventually.  However, if I set the parralelism_hint argument of Bolt1 equal
> to 5, then it should be fine.
>
> Here's the problem:
>
> I cannot guarantee that the processing time in Bolt1 will always be 5
> seconds.  So it could be that a tuple received by Bolt1 later in time is
> emitted before tuples that were received earlier than it.  In other words,
> using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples
> emitted from Spout1 as [t1, t2, t3].
>
> Is there a way to make sure that 1) Bolt2 receives the tuples in order, as
> well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in
> of Spout1?
>
> Thanks!
>
> Best,
> Bryan
>

Re: Ensure tuples are processed in order while still avoiding bottlenecks

Reply via email to