I might be off-base here (experts can chime in) but have you looked at
Trident which is an API based on Storm? It does provide the concept of
aggregation to simulate micro-batches or in other words maintain state. You
can even persists this partial aggregated state too. If Spark's in-memory
M/R is not fast enough for you then you can take a look at that, perhaps.

Regards,
Shahab


On Fri, Jun 6, 2014 at 6:40 PM, Jonathan Poon <jkp...@ucdavis.edu> wrote:

> Hi Kyle,
>
> I'm looking for a real-time batch processing tool.  In my case, I'm
> looking to make correlations between all of the sensors at each time
> interval.
>
> I could use Hadoop (Map Reduce), but it requires I need to collect all of
> the data before I can batch process each time partition of data from each
> sensor.
>
> Another tool I'm also looking at is Spark Streaming, which allows me to
> collect data at different time intervals and processing that batch of data
> using Map Reduce
>
> However, Map Reduce seems inefficient because my sensor data is already
> time sorted naturally.  In addition, I would like real-time data on the fly.
>
> Seems like Storm might be a candidate for this application.  Please let me
> know what you think...!  Thanks for your help!
>
> Jonathan
>
>
>
>
> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <knusb...@yahoo-inc.com>
> wrote:
>
>>  You could send a signal tuple from the spout when it knows it's sent
>> the last tuple for a time period, or include a field in the tuple for
>> indicating it's the last member.
>>
>> I'm curious about why you want to do this, since the purpose of storm is
>> to facilitate stream processing rather than the type of batch processing
>> you're describing.
>>
>> -- Kyle
>>
>> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>>
>>  Hi Nathan,
>>
>>  The sensor data I have is naturally time sorted, since its just
>> collecting data and emitting it to a spout. Is it possible for a bolt to
>> know when all of the tuples with the same time tag have been collected and
>> to start processing it together?  Or is it only possible for a bolt to
>> process each tuple one at a time?
>>
>>  Thanks!
>>
>>
>>
>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncle...@gmail.com> wrote:
>>
>>> You can have your bolt subscribe to the spout using fields grouping and
>>> use time tag as your key.
>>>  On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkp...@ucdavis.edu> wrote:
>>>
>>>>    Hi Everyone,
>>>>
>>>>  I'm currently investigating different data processing tools for an
>>>> application I'm interested in.  I have many sensors that I collect data
>>>> from.  However, I would like to group the data from every sensor at
>>>> predefined time intervals and process it together.
>>>>
>>>>  Using Storm terminology, I would have each sensor send data to a
>>>> spout.  The spouts would then send tuples to a specific bolt that will
>>>> process all of the data within a specific time partition.  Each spout will
>>>> tag each event with a time id and each bolt will process data after
>>>> collecting all of the data with the same time id tags.
>>>>
>>>>  Is this possible with Storm?
>>>>
>>>>  I appreciate your help!
>>>>
>>>>  Jonathan
>>>>
>>>
>>
>>
>

Reply via email to