Re: Feature Generation for Large datasets composed of many time series

2017-07-24 Thread julio . cesare
Ok thanks ! That's exactly the kind of thing I was imagining with Apache BEAM. I still have a few questions. - regarding performances will this be efficient ? Even with large "window" / many id / values / timestamps ... ? - my goal after all this is to store it in cassandra and/or use the

Re: Feature Generation for Large datasets composed of many time series

2017-07-24 Thread Lukasz Cwik
The more ids the better as this increases the parallelism in your pipeline. Also, the aggregations that you listed like min/max/average are very efficient operations to perform on datasets. Cassandra is already supported: https://github.com/apache/beam/tree/master/sdks/java/io/cassandra Using a