Ensure tuples are processed in order while still avoiding bottlenecks

Bryan Hernandez Tue, 11 Nov 2014 08:04:23 -0800

Hi,

I'd like to know if there is a way to do the following in Storm:


The topology:

Spout1 -> Bolt1 -> Bolt2

*Spout1*: emits *about* 1 tuple per second.
*Bolt1*: execute() method takes, *on average*, 5 seconds to process each
tuple.
*Bolt2*: must receive tuples in the same order that they were emitted from
Spout1.

As I understand it, without parallelization, Bolt1's input queue should
grow by 4 tuples every 5 seconds.  This, of course, would overflow
eventually.  However, if I set the parralelism_hint argument of Bolt1 equal
to 5, then it should be fine.

Here's the problem:

I cannot guarantee that the processing time in Bolt1 will always be 5
seconds.  So it could be that a tuple received by Bolt1 later in time is
emitted before tuples that were received earlier than it.  In other words,
using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples
emitted from Spout1 as [t1, t2, t3].

Is there a way to make sure that 1) Bolt2 receives the tuples in order, as
well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate in
of Spout1?

Thanks!

Best,
Bryan

Ensure tuples are processed in order while still avoiding bottlenecks

Reply via email to