Hi, yes this is a research type question. I already implemented a prototype framework in python with a similar flowing topology as in Storm and now I want to do this in parallel on a greater scale.
I have another question concerning stream grouping: Since in my case a distance-computing-bolt has to keep the tuples for the last n seconds, I cannot use different tasks for this bolt (!?). What I would need is, that the tuples are replicated for each bolt, but the actual work should be distributed across the tasks. This does not mean that I need this (I don't think I will), but I just want to understand what's possible. Best, Johannes ________________________________ From: Klausen Schaefersinho [[email protected]] Sent: Tuesday, November 11, 2014 10:32 AM To: [email protected] Subject: Re: Sliding window on numerical data? Hi, IBM InfoSphere Streams has some kind of visual query editor. However I doubt that you can model a heart beat in it. If you need to start quick just write your own bolt and use some kind of ring-buffer to store the last n events. Then writer a function that converts it to a vector. Please remember that the event frequency is not nessecay the sampling frequency of your sensor. So your events should have a time stamp from the sensor and you should build the vector using these time stamps, not the position in the ring-buffer. Do you know already that Dynamic Time Warping will work? Your tasks sounds pretty much like research, so until you know what model and representation you want to use it might also be the best to first conduct some experiments in R or SciPi and build the windows from a sensor log file. Cheers, Klaus On Tue, Nov 11, 2014 at 10:07 AM, Johannes Hugo Kitschke <[email protected]<mailto:[email protected]>> wrote: Hi, thanks for your repsonse. I had a look at both Esper and Siddhi. Please correct me if I'm wrong, but I don't think these fit my purpose: Both use a SQL-Like query language, right? What I am aiming at, is to visually define a query (consider for example the heartbeat [1]) and then compute the distance between the query and some datastream (for each new datapoint). But I cannot define such a complex query using this SQL-like query language!? Or did I miss something? Johannes [1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals On 11/11/2014 08:16 AM, Sajith wrote: Hi Johannes, You can also use Sidddhi complex event processing engine to achieve your tasks. [1] is a bolt I wrote using Siddhi. [1] https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal <[email protected]<mailto:[email protected]>> wrote: Hi Johannes, We are doing something similar using Esper in storm. The queries are set in EsperBolt and realtime data is processed through that. -Manoj On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <[email protected]<mailto:[email protected]>> wrote: Hi, > The main reason I write this mail is, because I have to access the last N > elements of each stream for each new arriving element You can not go back in a stream, so you have to write your own bolt that stores the last windows. That should be pretty straight forward. On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <[email protected]<mailto:[email protected]>> wrote: Hi there, I want to solve a task and wonder if Storm is suitable for this. My task: - input (many) numerical datastreams (e.g. stock market data, seismic data, ...) - define (many) queries of length N (may be different for each query) - compute distance (e.g. using dynamic time warping) between each query and the last N datapoints of each datastream - report matches if distance < eps This is a rough outline. I am completely new to Storm and I wonder, if Storm is a good candidate to solve this problem (if not, alternatives?). The main reason I write this mail is, because I have to access the last N elements of each stream for each new arriving element. But every example I found does only do some computation on one element at a time. While writing this, I came across the 'RollingCountBolt'. Does this implement the functionality I want? Does this mean each Bolt doing the distance computation for the last N points has to store these points? I would be interested in your thoughts, hints and ideas. Thanks! Johannes
