Hi Kyle,

I have a scientific application that has thousands of sensors spitting out
approximately 500,000 events per second.  I'm looking for a tool that can
process these events in a real-time manner.  In my application, I need to
read events from each of the sensors and create correlations between the
sensors based on the data.  You described it, I pretty much need a
real-time batch processing tool.

For my application, the computation for each time slice should be much
longer than the time slice itself.  I need a highly scalable tool that can
read all of the data, process the data from each time slice in parallel.

Jonathan


On Fri, Jun 6, 2014 at 5:04 PM, Jonathan Poon <jkp...@ucdavis.edu> wrote:

> I will take a look into Trident as well.  Thanks for the tip!
>
>
> On Fri, Jun 6, 2014 at 3:53 PM, Kyle Nusbaum <knusb...@yahoo-inc.com>
> wrote:
>
>>  Sounds interesting.
>>
>> I don't know much about your project, so I won't speculate about your
>> purposes.
>>
>> One thing to consider is that the duration of the computation on a time
>> slice must be longer than the time slice itself to really make this type of
>> setup worthwhile. Otherwise you could just feed the batches through the
>> same bolt, since it would be done processing a batch before the next one
>> comes in.
>>
>> -- Kyle
>>
>> On 06/06/2014 05:40 PM, Jonathan Poon wrote:
>>
>>    Hi Kyle,
>>
>>  I'm looking for a real-time batch processing tool.  In my case, I'm
>> looking to make correlations between all of the sensors at each time
>> interval.
>>
>>  I could use Hadoop (Map Reduce), but it requires I need to collect all
>> of the data before I can batch process each time partition of data from
>> each sensor.
>>
>>  Another tool I'm also looking at is Spark Streaming, which allows me to
>> collect data at different time intervals and processing that batch of data
>> using Map Reduce
>>
>>  However, Map Reduce seems inefficient because my sensor data is already
>> time sorted naturally.  In addition, I would like real-time data on the fly.
>>
>>  Seems like Storm might be a candidate for this application.  Please let
>> me know what you think...!  Thanks for your help!
>>
>> Jonathan
>>
>>
>>
>>
>> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <knusb...@yahoo-inc.com>
>> wrote:
>>
>>>  You could send a signal tuple from the spout when it knows it's sent
>>> the last tuple for a time period, or include a field in the tuple for
>>> indicating it's the last member.
>>>
>>> I'm curious about why you want to do this, since the purpose of storm is
>>> to facilitate stream processing rather than the type of batch processing
>>> you're describing.
>>>
>>> -- Kyle
>>>
>>> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>>>
>>>  Hi Nathan,
>>>
>>>  The sensor data I have is naturally time sorted, since its just
>>> collecting data and emitting it to a spout. Is it possible for a bolt to
>>> know when all of the tuples with the same time tag have been collected and
>>> to start processing it together?  Or is it only possible for a bolt to
>>> process each tuple one at a time?
>>>
>>>  Thanks!
>>>
>>>
>>>
>>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncle...@gmail.com> wrote:
>>>
>>>> You can have your bolt subscribe to the spout using fields grouping and
>>>> use time tag as your key.
>>>>  On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkp...@ucdavis.edu> wrote:
>>>>
>>>>>    Hi Everyone,
>>>>>
>>>>>  I'm currently investigating different data processing tools for an
>>>>> application I'm interested in.  I have many sensors that I collect data
>>>>> from.  However, I would like to group the data from every sensor at
>>>>> predefined time intervals and process it together.
>>>>>
>>>>>  Using Storm terminology, I would have each sensor send data to a
>>>>> spout.  The spouts would then send tuples to a specific bolt that will
>>>>> process all of the data within a specific time partition.  Each spout will
>>>>> tag each event with a time id and each bolt will process data after
>>>>> collecting all of the data with the same time id tags.
>>>>>
>>>>>  Is this possible with Storm?
>>>>>
>>>>>  I appreciate your help!
>>>>>
>>>>>  Jonathan
>>>>>
>>>>
>>>
>>>
>>
>>
>

Reply via email to