Storm Configuration Possible?

Jonathan Poon Thu, 19 Jun 2014 12:20:27 -0700

Hi Everyone,

I need your advice and have some questions!


I'm looking into using Apache Storm - Trident for a scientific application
to process sensor data in real time.  Essentially, I have hundreds of
sensors that are emitting data over time and I would like to create
correlations between all of the sensors.  I need to batch all of the sensor
data based on time.  So I need to group together all of the data from every
sensor over a 1 second interval lets say, and process it together to apply
my correlation algorithm.

The way I see the Storm topology is each sensor is connected and sends data
to a Spout. The spouts then send a batch of tuples to a combiner bolt to
create a larger batch of tuples that all fall within a specific time
partition.  That larger patch of tuples are then sent to another bolt that
runs my correlation algorithm and outputs data I need to save.

I assume this type of topology can be highly parallel, where I can process
multiple time partitioned data at once continuously.


1.)  Is this type of topology possible?

2.)  How does the combiner bolt know if all of the data from each spout has
been received before it batches it all together before sending it off to
the next bolt?

3.)  After processing a batch of time partitioned data, does Storm
automatically kill the thread and restarts a fresh instance?  Or do I need
to code memory clearing functions?

Storm Configuration Possible?

Reply via email to