Hi,

I have a Beam streaming pipeline processing live data from PubSub using
sliding windows on event timestamps. I want to recompute the metrics for
historical data in BigQuery. What are my options?

I have looked at
https://stackoverflow.com/questions/56702878/how-to-use-apache-beam-to-process-historic-time-series-data
and I have a couple of questions

1. Can I use the same instance of the streaming pipeline? I don't think so
as the watermark would be way past the historical event timestamps.

2. Could I possibly split the pipeline and use one branch for historical
data and one for the live streaming data?

I am trying hard not to raise parallel infrastructure to process historical
data.

Any inputs would be very much appreciated

Thanks
Kishore

Reply via email to