To clarify, I meant if you implement a sorting bolt in core storm.

I haven't used trident; would having only one tuple per batch incur a high
overhead?
On Nov 14, 2014 7:57 AM, "Nathan Leung" <[email protected]> wrote:

> If you do this be careful with timeouts. It will be very easy to have
> cascading failures if you don't handle timed out tuples in a good manner.
> On Nov 14, 2014 4:35 AM, "Susheel Kumar Gadalay" <[email protected]>
> wrote:
>
>> Why don't u try Trident with batch size of 1 tuple and last bolt as
>> partition persist.
>>
>> Trident ensures batches are updated in the order of emit even if
>> processed parallel.
>>
>> On 11/11/14, Bryan Hernandez <[email protected]> wrote:
>> > Hi,
>> >
>> > I'd like to know if there is a way to do the following in Storm:
>> >
>> > The topology:
>> >
>> > Spout1 -> Bolt1 -> Bolt2
>> >
>> > *Spout1*: emits *about* 1 tuple per second.
>> > *Bolt1*: execute() method takes, *on average*, 5 seconds to process each
>> > tuple.
>> > *Bolt2*: must receive tuples in the same order that they were emitted
>> from
>> > Spout1.
>> >
>> > As I understand it, without parallelization, Bolt1's input queue should
>> > grow by 4 tuples every 5 seconds.  This, of course, would overflow
>> > eventually.  However, if I set the parralelism_hint argument of Bolt1
>> equal
>> > to 5, then it should be fine.
>> >
>> > Here's the problem:
>> >
>> > I cannot guarantee that the processing time in Bolt1 will always be 5
>> > seconds.  So it could be that a tuple received by Bolt1 later in time is
>> > emitted before tuples that were received earlier than it.  In other
>> words,
>> > using parallelism, I could have Bolt2 receiving [t2, t1, t3], for tuples
>> > emitted from Spout1 as [t1, t2, t3].
>> >
>> > Is there a way to make sure that 1) Bolt2 receives the tuples in order,
>> as
>> > well as 2) ensuring the Bolt1 doesn't fall behind of the emission rate
>> in
>> > of Spout1?
>> >
>> > Thanks!
>> >
>> > Best,
>> > Bryan
>> >
>>
>

Reply via email to