What's your use case?

The complexities can be manage d as long as your tuple branching is
reasonable I.e. 1 tuple creates several other tuples and you need to sync
results between them.

On Sep 20, 2016 8:19 AM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote:

> You're right. For that I have to manage the queue and all those
> complexities of timeout. If Storm is not the right place to do this then
> what else?
>
>
>
> On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma <asharma52...@gmail.com>
> wrote:
>
>> The correct way is to perform time window aggregation using bucketing.
>>
>> Use the timestamp on your event computed from.various stages and send it
>> to a single bolt where the aggregation happens. You only emit from this
>> bolt once you receive results from both parts.
>>
>> It's like creating a barrier or the join phase of a fork join pool.
>>
>> That said the more important question is is Storm the right place do to
>> this? When you perform time window aggregation you are susceptible to tuple
>> timeouts and have to also deal with making sure your aggregation is
>> idempotent.
>>
>> On Sep 20, 2016 7:49 AM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote:
>>
>>> But how would that solve the syncing problem?
>>>
>>>
>>>
>>> On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos <
>>> alberto....@gmail.com> wrote:
>>>
>>>> I would dump the *Bolt-A* results in a shared-data-store/queue and
>>>> have a separate workflow with another spout and Bolt-B draining from there
>>>>
>>>> On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary <shry.ha...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I am thinking of doing the following.
>>>>>
>>>>> Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as
>>>>> individual tuples.
>>>>>
>>>>> Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs from
>>>>> a json and emits them as multiple streams.
>>>>>
>>>>> Bolt-B receives these streams and do the computation on them.
>>>>>
>>>>> I need to make a cumulative result from all the multiple JSONs (which
>>>>> are emerged from a single JSON) in a Bolt. But a bolt static instance
>>>>> variable is only shared between tasks per worker. How do achieve this
>>>>> syncing process.
>>>>>
>>>>>                               --->
>>>>> Spout ---> Bolt-A   --->   Bolt-B  ---> Final result
>>>>>                               --->
>>>>>
>>>>> The final result is per JSON which was read from Kafka.
>>>>>
>>>>> Or is there any other way to achieve this better?
>>>>>
>>>>
>>>>
>>>
>

Reply via email to