Re: Syncing multiple streams to compute final result from a bolt

2016-09-23 Thread Harsh Choudhary
Thanks for all the help. :) On Wed, Sep 21, 2016 at 11:56 AM, Harsh Choudhary wrote: > It is real-time. I get streaming JSONs from Kafka. > > > > > On Wed, Sep 21, 2016 at 4:15 AM, Ambud Sharma > wrote: > >> Is this real-time or batch? >> >> If

Re: Syncing multiple streams to compute final result from a bolt

2016-09-21 Thread Harsh Choudhary
It is real-time. I get streaming JSONs from Kafka. On Wed, Sep 21, 2016 at 4:15 AM, Ambud Sharma wrote: > Is this real-time or batch? > > If batch this is perfect for MapReduce or Spark. > > If real-time then you should use Spark or Storm Trident. > > On Sep 20, 2016

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
Is this real-time or batch? If batch this is perfect for MapReduce or Spark. If real-time then you should use Spark or Storm Trident. On Sep 20, 2016 9:39 AM, "Harsh Choudhary" wrote: > My use case is that I have a json which contains an array. I need to split > that

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
My use case is that I have a json which contains an array. I need to split that array into multiple jsons and do some computations on them. After that, results from each json has to be used in further calculation altogether and come up with the final result. *Cheers!* Harsh Choudhary / Software

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
What's your use case? The complexities can be manage d as long as your tuple branching is reasonable I.e. 1 tuple creates several other tuples and you need to sync results between them. On Sep 20, 2016 8:19 AM, "Harsh Choudhary" wrote: > You're right. For that I have to

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
You're right. For that I have to manage the queue and all those complexities of timeout. If Storm is not the right place to do this then what else? On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma wrote: > The correct way is to perform time window aggregation using

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Ambud Sharma
The correct way is to perform time window aggregation using bucketing. Use the timestamp on your event computed from.various stages and send it to a single bolt where the aggregation happens. You only emit from this bolt once you receive results from both parts. It's like creating a barrier or

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
But how would that solve the syncing problem? On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos wrote: > I would dump the *Bolt-A* results in a shared-data-store/queue and have a > separate workflow with another spout and Bolt-B draining from there > > On Tue, Sep 20,

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Alberto São Marcos
I would dump the *Bolt-A* results in a shared-data-store/queue and have a separate workflow with another spout and Bolt-B draining from there On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary wrote: > Hi > > I am thinking of doing the following. > > Spout subscribed to