Ok thanks !
That's exactly the kind of thing I was imagining with Apache BEAM.
I still have a few questions.
- regarding performances will this be efficient ? Even with large
"window" / many id / values / timestamps ... ?
- my goal after all this is to store it in cassandra and/or use the
The more ids the better as this increases the parallelism in your pipeline.
Also, the aggregations that you listed like min/max/average are very
efficient operations to perform on datasets.
Cassandra is already supported:
https://github.com/apache/beam/tree/master/sdks/java/io/cassandra
Using a