Sounds interesting.

I don't know much about your project, so I won't speculate about your purposes.

One thing to consider is that the duration of the computation on a time slice must be longer than the time slice itself to really make this type of setup worthwhile. Otherwise you could just feed the batches through the same bolt, since it would be done processing a batch before the next one comes in.

-- Kyle

On 06/06/2014 05:40 PM, Jonathan Poon wrote:
Hi Kyle,

I'm looking for a real-time batch processing tool. In my case, I'm looking to make correlations between all of the sensors at each time interval.

I could use Hadoop (Map Reduce), but it requires I need to collect all of the data before I can batch process each time partition of data from each sensor.

Another tool I'm also looking at is Spark Streaming, which allows me to collect data at different time intervals and processing that batch of data using Map Reduce

However, Map Reduce seems inefficient because my sensor data is already time sorted naturally. In addition, I would like real-time data on the fly.

Seems like Storm might be a candidate for this application. Please let me know what you think...! Thanks for your help!

Jonathan




On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <knusb...@yahoo-inc.com <mailto:knusb...@yahoo-inc.com>> wrote:

    You could send a signal tuple from the spout when it knows it's
    sent the last tuple for a time period, or include a field in the
    tuple for indicating it's the last member.

    I'm curious about why you want to do this, since the purpose of
    storm is to facilitate stream processing rather than the type of
    batch processing you're describing.

    -- Kyle

    On 06/06/2014 05:14 PM, Jonathan Poon wrote:
    Hi Nathan,

    The sensor data I have is naturally time sorted, since its just
    collecting data and emitting it to a spout. Is it possible for a
    bolt to know when all of the tuples with the same time tag have
    been collected and to start processing it together?  Or is it
    only possible for a bolt to process each tuple one at a time?

    Thanks!



    On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncle...@gmail.com
    <mailto:ncle...@gmail.com>> wrote:

        You can have your bolt subscribe to the spout using fields
        grouping and use time tag as your key.

        On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkp...@ucdavis.edu
        <mailto:jkp...@ucdavis.edu>> wrote:

            Hi Everyone,

            I'm currently investigating different data processing
            tools for an application I'm interested in.  I have many
            sensors that I collect data from. However, I would like
            to group the data from every sensor at predefined time
            intervals and process it together.

            Using Storm terminology, I would have each sensor send
            data to a spout. The spouts would then send tuples to a
            specific bolt that will process all of the data within a
            specific time partition.  Each spout will tag each event
            with a time id and each bolt will process data after
            collecting all of the data with the same time id tags.

            Is this possible with Storm?

            I appreciate your help!

            Jonathan





Reply via email to