>> Now because of any reason machine with hourly aggregated data goes down I >> want missing hour tupples to replay from my queue.
An Hour is too long since Storm Spout would timeout way before that. Even though it is configurable I do not think it would be the right way of doing it. There are many NoSQL products that come to mind that can perform aggregations (CouchBase, ElasticSearch, and most other K/V type of NoSQLs). You would put put storm in front of a NoSQL to reduce the data throughput (event consolidation) or perform extremely non-standard aggregations (that are not covered by a simple map-reduce script) or to if you must get real-time stats. Since you said your stats are not real-time, this leaves us with the following questions: 1. What is your raw event throughput ? 2. What type of aggregations are you trying to perform ? Regards, Itai ________________________________ From: Nipun Batra <[email protected]> Sent: Wednesday, October 15, 2014 9:28 AM To: [email protected] Subject: Re: Batch ID TxId Hi Yuval Thanks for responding, Here is what I have in mind I was thinking to aggregate the data on hourly basis in memory and persisting every hour. Now because of any reason machine with hourly aggregated data goes down I want missing hour tupples to replay from my queue. Any suggestions? Regards Nipun On Tue, Oct 14, 2014 at 4:33 PM, Yuval Oren <[email protected]<mailto:[email protected]>> wrote: Nipun, That seems to be contrary to the typical storm pattern of continuous processing. Is there a reason you can't continuously read new data? That might also scale better. -- Yuval Oren N3TWORK On Oct 14, 2014, at 8:52 AM, Nipun Batra <[email protected]<mailto:[email protected]>> wrote: Hi I have non ending data feed and I want to define a batch on hourly basis i.e. set batch id for all the tuples coming in at particular hour. if I write my custom spout how do I set batch ID / Tx Id Later the data feed will be consumed from Kafka topic, If I plan to use Kafka Spout again is there a way to batch OR TxID by hour. I have looked at many examples but I am not able to find it. Will appreciate if you can point me to right direction OR any example of custom spout setting batch id I apologize if this is already asked, I tried to look around but found nothing. Thank you in advance Nipun
