Hi, I have a Beam streaming pipeline processing live data from PubSub using sliding windows on event timestamps. I want to recompute the metrics for historical data in BigQuery. What are my options?
I have looked at https://stackoverflow.com/questions/56702878/how-to-use-apache-beam-to-process-historic-time-series-data and I have a couple of questions 1. Can I use the same instance of the streaming pipeline? I don't think so as the watermark would be way past the historical event timestamps. 2. Could I possibly split the pipeline and use one branch for historical data and one for the live streaming data? I am trying hard not to raise parallel infrastructure to process historical data. Any inputs would be very much appreciated Thanks Kishore
