What's your use case? The complexities can be manage d as long as your tuple branching is reasonable I.e. 1 tuple creates several other tuples and you need to sync results between them.
On Sep 20, 2016 8:19 AM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote: > You're right. For that I have to manage the queue and all those > complexities of timeout. If Storm is not the right place to do this then > what else? > > > > On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma <asharma52...@gmail.com> > wrote: > >> The correct way is to perform time window aggregation using bucketing. >> >> Use the timestamp on your event computed from.various stages and send it >> to a single bolt where the aggregation happens. You only emit from this >> bolt once you receive results from both parts. >> >> It's like creating a barrier or the join phase of a fork join pool. >> >> That said the more important question is is Storm the right place do to >> this? When you perform time window aggregation you are susceptible to tuple >> timeouts and have to also deal with making sure your aggregation is >> idempotent. >> >> On Sep 20, 2016 7:49 AM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote: >> >>> But how would that solve the syncing problem? >>> >>> >>> >>> On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos < >>> alberto....@gmail.com> wrote: >>> >>>> I would dump the *Bolt-A* results in a shared-data-store/queue and >>>> have a separate workflow with another spout and Bolt-B draining from there >>>> >>>> On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary <shry.ha...@gmail.com> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> I am thinking of doing the following. >>>>> >>>>> Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as >>>>> individual tuples. >>>>> >>>>> Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs from >>>>> a json and emits them as multiple streams. >>>>> >>>>> Bolt-B receives these streams and do the computation on them. >>>>> >>>>> I need to make a cumulative result from all the multiple JSONs (which >>>>> are emerged from a single JSON) in a Bolt. But a bolt static instance >>>>> variable is only shared between tasks per worker. How do achieve this >>>>> syncing process. >>>>> >>>>> ---> >>>>> Spout ---> Bolt-A ---> Bolt-B ---> Final result >>>>> ---> >>>>> >>>>> The final result is per JSON which was read from Kafka. >>>>> >>>>> Or is there any other way to achieve this better? >>>>> >>>> >>>> >>> >