Hello I am working on a relatively simple streaming data pipeline. Imagine the elements of PCollection represent something like sensor readings (with event timestamps):
Reading(id: String, value: Int) For each incoming element I want to have at the end of the pipeline: Reading(id: String, value: Int, meanLastHr: Double) where meanLastHr is the mean of all the values for a given id in an hour preceding the timestamp of this particular reading. If I am to implement this in Beam, to get the means, I have to transform the input into PCollection of KVs (id, value), apply 1hr window, CombinePerKey and trigger on every element. My question is what is the simplest/most idiomatic way to join these values back with the original reading? This is one-to-one join on (event timestamp, id), I can assume that there are no readings where this pair of values would be the same. One way I can think of is to use the PCollection with aggregated values as a side input to a ParDo, but I'm not sure how the windows would be mapped to a main global-windowed collection (what if I applied a global windowing to the side input?). If, for example, I used Flink directly I could go with KeyedCoProcessFunction, but I don't see any concept in Beam that would map to it directly. Any help and suggestions would be appreciated. Best regards Pawel
