Re: Ensure tuples are processed in order while still avoiding bottlenecks

Bryan Hernandez Tue, 11 Nov 2014 08:09:20 -0800

I should clarify that I am aware I could make Bolt2 also subscribe to
Spout1, so that it knows the correct order.  However, I am wondering if
there is a built-in Storm way of handling this requirement in general.


Thanks!

Best,

Bryan


On Tue, Nov 11, 2014 at 5:03 PM, Bryan Hernandez <[email protected]>
wrote:

> Hi,
>
> I'd like to know if there is a way to do the following in Storm:
>
> The topology:
>
> Spout1 -> Bolt1 -> Bolt2
>
> *Spout1*: emits *about* 1 tuple per second.
> *Bolt1*: execute() method takes, *on average*, 5 seconds to process each
> tuple.
> *Bolt2*: must receive tuples in the same order that they were emitted
> from Spout1.
>
> As I understand it, without parallelization, Bolt1's input queue should
> grow by 4 tuples every 5 seconds.  This, of course, would overflow
> eventually.  However, if I set the parralelism_hint argument of Bolt1 equal
> to 5, then it should be fine.
>
> Here's the problem:
>
> I cannot guarantee that the processing time in Bolt1 will always be 5
> seconds.  So it could be that a tuple received by Bolt1 later in time is
> emitted before tuples that were received earlier than it.  In other words,
> using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples
> emitted from Spout1 as [t1, t2, t3].
>
> Is there a way to make sure that 1) Bolt2 receives the tuples in order, as
> well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in
> of Spout1?
>
> Thanks!
>
> Best,
> Bryan
>
>

Re: Ensure tuples are processed in order while still avoiding bottlenecks

Reply via email to