Hi Kyle, I have a scientific application that has thousands of sensors spitting out approximately 500,000 events per second. I'm looking for a tool that can process these events in a real-time manner. In my application, I need to read events from each of the sensors and create correlations between the sensors based on the data. You described it, I pretty much need a real-time batch processing tool.
For my application, the computation for each time slice should be much longer than the time slice itself. I need a highly scalable tool that can read all of the data, process the data from each time slice in parallel. Jonathan On Fri, Jun 6, 2014 at 5:04 PM, Jonathan Poon <jkp...@ucdavis.edu> wrote: > I will take a look into Trident as well. Thanks for the tip! > > > On Fri, Jun 6, 2014 at 3:53 PM, Kyle Nusbaum <knusb...@yahoo-inc.com> > wrote: > >> Sounds interesting. >> >> I don't know much about your project, so I won't speculate about your >> purposes. >> >> One thing to consider is that the duration of the computation on a time >> slice must be longer than the time slice itself to really make this type of >> setup worthwhile. Otherwise you could just feed the batches through the >> same bolt, since it would be done processing a batch before the next one >> comes in. >> >> -- Kyle >> >> On 06/06/2014 05:40 PM, Jonathan Poon wrote: >> >> Hi Kyle, >> >> I'm looking for a real-time batch processing tool. In my case, I'm >> looking to make correlations between all of the sensors at each time >> interval. >> >> I could use Hadoop (Map Reduce), but it requires I need to collect all >> of the data before I can batch process each time partition of data from >> each sensor. >> >> Another tool I'm also looking at is Spark Streaming, which allows me to >> collect data at different time intervals and processing that batch of data >> using Map Reduce >> >> However, Map Reduce seems inefficient because my sensor data is already >> time sorted naturally. In addition, I would like real-time data on the fly. >> >> Seems like Storm might be a candidate for this application. Please let >> me know what you think...! Thanks for your help! >> >> Jonathan >> >> >> >> >> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <knusb...@yahoo-inc.com> >> wrote: >> >>> You could send a signal tuple from the spout when it knows it's sent >>> the last tuple for a time period, or include a field in the tuple for >>> indicating it's the last member. >>> >>> I'm curious about why you want to do this, since the purpose of storm is >>> to facilitate stream processing rather than the type of batch processing >>> you're describing. >>> >>> -- Kyle >>> >>> On 06/06/2014 05:14 PM, Jonathan Poon wrote: >>> >>> Hi Nathan, >>> >>> The sensor data I have is naturally time sorted, since its just >>> collecting data and emitting it to a spout. Is it possible for a bolt to >>> know when all of the tuples with the same time tag have been collected and >>> to start processing it together? Or is it only possible for a bolt to >>> process each tuple one at a time? >>> >>> Thanks! >>> >>> >>> >>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncle...@gmail.com> wrote: >>> >>>> You can have your bolt subscribe to the spout using fields grouping and >>>> use time tag as your key. >>>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkp...@ucdavis.edu> wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> I'm currently investigating different data processing tools for an >>>>> application I'm interested in. I have many sensors that I collect data >>>>> from. However, I would like to group the data from every sensor at >>>>> predefined time intervals and process it together. >>>>> >>>>> Using Storm terminology, I would have each sensor send data to a >>>>> spout. The spouts would then send tuples to a specific bolt that will >>>>> process all of the data within a specific time partition. Each spout will >>>>> tag each event with a time id and each bolt will process data after >>>>> collecting all of the data with the same time id tags. >>>>> >>>>> Is this possible with Storm? >>>>> >>>>> I appreciate your help! >>>>> >>>>> Jonathan >>>>> >>>> >>> >>> >> >> >