Re: Syncing multiple streams to compute final result from a bolt

Harsh Choudhary Tue, 20 Sep 2016 08:20:17 -0700

You're right. For that I have to manage the queue and all those
complexities of timeout. If Storm is not the right place to do this then
what else?




On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma <asharma52...@gmail.com>
wrote:

> The correct way is to perform time window aggregation using bucketing.
>
> Use the timestamp on your event computed from.various stages and send it
> to a single bolt where the aggregation happens. You only emit from this
> bolt once you receive results from both parts.
>
> It's like creating a barrier or the join phase of a fork join pool.
>
> That said the more important question is is Storm the right place do to
> this? When you perform time window aggregation you are susceptible to tuple
> timeouts and have to also deal with making sure your aggregation is
> idempotent.
>
> On Sep 20, 2016 7:49 AM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote:
>
>> But how would that solve the syncing problem?
>>
>>
>>
>> On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos <
>> alberto....@gmail.com> wrote:
>>
>>> I would dump the *Bolt-A* results in a shared-data-store/queue and have
>>> a separate workflow with another spout and Bolt-B draining from there
>>>
>>> On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary <shry.ha...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am thinking of doing the following.
>>>>
>>>> Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as
>>>> individual tuples.
>>>>
>>>> Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs from
>>>> a json and emits them as multiple streams.
>>>>
>>>> Bolt-B receives these streams and do the computation on them.
>>>>
>>>> I need to make a cumulative result from all the multiple JSONs (which
>>>> are emerged from a single JSON) in a Bolt. But a bolt static instance
>>>> variable is only shared between tasks per worker. How do achieve this
>>>> syncing process.
>>>>
>>>>                               --->
>>>> Spout ---> Bolt-A   --->   Bolt-B  ---> Final result
>>>>                               --->
>>>>
>>>> The final result is per JSON which was read from Kafka.
>>>>
>>>> Or is there any other way to achieve this better?
>>>>
>>>
>>>
>>

Re: Syncing multiple streams to compute final result from a bolt

Reply via email to