Re: storm use case questions

Tian Guo Wed, 03 Sep 2014 01:51:48 -0700

Hi, All

Regarding the average and standard deviation of a stream from a specific
sensor, these two variables can be computed incrementally and take
constant time to update. So, I do not see the burden even if the
implementation is trivial. And the distributed stream processing looks like
redundant for only hundreds of streams.


Storm is a cluster based distributed data processing rather than
a decentralized system like sensor network. Whether it is applicable for
your scenario depends on where you deploy it inside your architecture.

Best,


2014-09-03 8:59 GMT+02:00 Vikas Agarwal <[email protected]>:

> Hi Yuheng,
>
> We are also exploring/implementing for analyzing stream of messages
> (twitter stream and other sources). With my short experience, one thing I
> came know is that a lot would depend on the parallelism of the spouts in
> your topology, so you can parallelize the ingestion of data using
> partitioning or similar stuff, you can benefit from storm definitely
> otherwise you would see lot of failed messages which may accumulate a large
> backlog of such overflowing input data.
>
>
> On Wed, Sep 3, 2014 at 1:01 AM, Yuheng Du <[email protected]>
> wrote:
>
>> Hi guys,
>>
>> I have a stream of sensor data coming from rabbitmq. For each sensor
>> message, it is of the JSON format and have the following fields:
>>
>> deviceId: "BOT-N3"
>> reading0: 2.25
>> reading1: 3.78
>> ....
>> readingN: -1.35
>>
>> each float number of readingN represents a sensor reading on a specific
>> field location.
>>
>> Now for each incoming message, I want to do a query which gives me the
>> average and standard deviation of a certain 'deviceId' 's 'readingN' over a
>> custom time range (a year ago to now, a month ago to now, etc). So if N=28,
>> for each incoming message I will need to do 28 queries on the historic data
>> at almost the same time. I need the query results to be returned in near
>> real time so the other incoming messages won't get blocked.
>>
>> Is STORM a good solution to this issue?
>>
>> I have tried Elasticsearch-Logstash-Kibana stack already, It seems that
>> when the incoming message rates are high, The messages will be blocked
>> since the ES server can't correspond to hundreds of query requesst at
>> the same time.
>>
>> Will STORM help me in this case? What is the common use case of STORM in
>> processing real-time sensor data (coming from sensor network specifically)?
>>
>>  Thanks!
>>
>> best
>>
>> Yuheng
>>
>
>
>
> --
> Regards,
> Vikas Agarwal
> 91 – 9928301411
>
> InfoObjects, Inc.
> Execution Matters
> http://www.infoobjects.com
> 2041 Mission College Boulevard, #280
> Santa Clara, CA 95054
> +1 (408) 988-2000 Work
> +1 (408) 716-2726 Fax
>
>

Re: storm use case questions

Reply via email to