I might be off-base here (experts can chime in) but have you looked at Trident which is an API based on Storm? It does provide the concept of aggregation to simulate micro-batches or in other words maintain state. You can even persists this partial aggregated state too. If Spark's in-memory M/R is not fast enough for you then you can take a look at that, perhaps.
Regards, Shahab On Fri, Jun 6, 2014 at 6:40 PM, Jonathan Poon <jkp...@ucdavis.edu> wrote: > Hi Kyle, > > I'm looking for a real-time batch processing tool. In my case, I'm > looking to make correlations between all of the sensors at each time > interval. > > I could use Hadoop (Map Reduce), but it requires I need to collect all of > the data before I can batch process each time partition of data from each > sensor. > > Another tool I'm also looking at is Spark Streaming, which allows me to > collect data at different time intervals and processing that batch of data > using Map Reduce > > However, Map Reduce seems inefficient because my sensor data is already > time sorted naturally. In addition, I would like real-time data on the fly. > > Seems like Storm might be a candidate for this application. Please let me > know what you think...! Thanks for your help! > > Jonathan > > > > > On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <knusb...@yahoo-inc.com> > wrote: > >> You could send a signal tuple from the spout when it knows it's sent >> the last tuple for a time period, or include a field in the tuple for >> indicating it's the last member. >> >> I'm curious about why you want to do this, since the purpose of storm is >> to facilitate stream processing rather than the type of batch processing >> you're describing. >> >> -- Kyle >> >> On 06/06/2014 05:14 PM, Jonathan Poon wrote: >> >> Hi Nathan, >> >> The sensor data I have is naturally time sorted, since its just >> collecting data and emitting it to a spout. Is it possible for a bolt to >> know when all of the tuples with the same time tag have been collected and >> to start processing it together? Or is it only possible for a bolt to >> process each tuple one at a time? >> >> Thanks! >> >> >> >> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncle...@gmail.com> wrote: >> >>> You can have your bolt subscribe to the spout using fields grouping and >>> use time tag as your key. >>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkp...@ucdavis.edu> wrote: >>> >>>> Hi Everyone, >>>> >>>> I'm currently investigating different data processing tools for an >>>> application I'm interested in. I have many sensors that I collect data >>>> from. However, I would like to group the data from every sensor at >>>> predefined time intervals and process it together. >>>> >>>> Using Storm terminology, I would have each sensor send data to a >>>> spout. The spouts would then send tuples to a specific bolt that will >>>> process all of the data within a specific time partition. Each spout will >>>> tag each event with a time id and each bolt will process data after >>>> collecting all of the data with the same time id tags. >>>> >>>> Is this possible with Storm? >>>> >>>> I appreciate your help! >>>> >>>> Jonathan >>>> >>> >> >> >