Hi,

yes this is a research type question. I already implemented a prototype 
framework in python with a similar flowing topology as in Storm and now I want 
to do this in parallel on a greater scale.

I have another question concerning stream grouping: Since in my case a 
distance-computing-bolt has to keep the tuples for the last n seconds, I cannot 
use different tasks for this bolt (!?). What I would need is, that the tuples 
are replicated for each bolt, but the actual work should be distributed across 
the tasks.
This does not mean that I need this (I don't think I will), but I just want to 
understand what's possible.

Best,
 Johannes

________________________________
From: Klausen Schaefersinho [[email protected]]
Sent: Tuesday, November 11, 2014 10:32 AM
To: [email protected]
Subject: Re: Sliding window on numerical data?

Hi,

IBM InfoSphere Streams has some kind of visual query editor. However I doubt 
that you can model a heart beat in it. If you need to start quick just write 
your own bolt and use some kind of ring-buffer to store the last n events. Then 
writer a function that converts it to a vector. Please remember that the event 
frequency is not nessecay the sampling frequency of your sensor. So your events 
should have a time stamp from the sensor and you should build the vector using 
these time stamps, not the position in the ring-buffer.

Do you know already that Dynamic Time Warping will work? Your tasks sounds 
pretty much like research, so until you know what model and representation you 
want to use it might also be the best to first conduct some experiments in R or 
SciPi and build the windows from a sensor log file.

Cheers,

Klaus



On Tue, Nov 11, 2014 at 10:07 AM, Johannes Hugo Kitschke 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi,

thanks for your repsonse. I had a look at both Esper and Siddhi. Please correct 
me if I'm wrong, but I don't think these fit my purpose: Both use a SQL-Like 
query language, right?
What I am aiming at, is to visually define a query (consider for example the 
heartbeat [1]) and then compute the distance between the query and some 
datastream (for each new datapoint). But I cannot define such a complex query 
using this SQL-like query language!? Or did I miss something?

Johannes

[1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals

On 11/11/2014 08:16 AM, Sajith wrote:
Hi Johannes,

You can also use Sidddhi complex event processing engine to achieve your tasks. 
[1] is a bolt I wrote using Siddhi.

[1] 
https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java

On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal 
<[email protected]<mailto:[email protected]>> wrote:
Hi Johannes,

We are doing something similar using Esper in storm.
The queries are set in EsperBolt and realtime data is processed through that.

-Manoj

On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

 > The main reason I write this mail is, because I have to access the last N 
 > elements of each stream for each new arriving element

You can not go back in a stream, so you have to write your own bolt that stores 
the last windows. That should be pretty straight forward.




On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi there,

I want to solve a task and wonder if Storm is suitable for this. My task:

    - input (many) numerical datastreams (e.g. stock market data, seismic data, 
...)
    - define (many) queries of length N (may be different for each query)
    - compute distance (e.g. using dynamic time warping) between each query and 
the last N datapoints of each datastream
    - report matches if distance < eps

This is a rough outline. I am completely new to Storm and I wonder, if Storm is 
a good candidate to solve this problem (if not, alternatives?). The main reason 
I write this mail is, because I have to access the last N elements of each 
stream for each new arriving element. But every example I found does only do 
some computation on one element at a time.

While writing this, I came across the 'RollingCountBolt'. Does this implement 
the functionality I want? Does this mean each Bolt doing the distance 
computation for the last N points has to store these points?

I would be interested in your thoughts, hints and ideas.

Thanks!
 Johannes




Reply via email to