Hi, any hint ? is my previous question unclear ?

I try to reformulate:
- how should I manage my shared buffers and data required for
moving/sliding windows management and nearest neighbours data aggregation ?
- would I really benefit from moving to storm vs my current script
considering that in-memory data management highly speed up my current
process ?

Thank you,
Xavier


On Thu, Apr 24, 2014 at 7:40 PM, Xavier Daull <[email protected]> wrote:

> I have already developed a Python script (not using storm) which
> transforms a stream of millions of prices history of different items
> (provided in 1 common csv) and output dedicate streams for each item with
> enriched data in real-time. This script computes and aggregates in
> real-time latest item price with past data to get moving average and slop
> over different timeframes (month/week/day/hour) and add to it latest data
> from nearest items (neighbours). The goal is to feed models for price
> prediction. In order to manage time aggregated data and nearest neighbours
> data I use a shared buffer of recent data needed for aggregation, latest
> computed data for each item and some shared timestamp indexes.
>
> I am wondering if I would really benefit from moving this script to storm
> and how.
>
> My first understanding of storm is I should:
> - create a dedicated spout class to fetch prices data.
> - create a dedicated bolt class to aggregate data (moving average / slopes
> / cross aggregated data between items).
>
> Where should I put my shared buffers and data required to efficiently
> aggregate and compute my time aggregated data and nearest neighbours data ?
>
> Will the topology impact performance compared to in-memory data management
> ? My current script, even if it is in Python, highly benefits from
> efficient buffered computation (no recompute, use delta average...), few
> data manipulation, minimum access to memory and computation.
>
> Thank you for your advice.
> Xavier
>

Reply via email to