Hi, I am new to the streaming world and trying to build a data pipeline that would analyze a data stream every hour to output some notification. I would love to hear you guy's advices and experiences with regarding to upgrading and patching a streaming job.
I started out the prototype to use window to aggregate the data and do computation. However, this leads me to wonder what if in the future the data pipeline needs to be significantly changed. Potentially, the old data pipeline would need to be stopped and the new one launched. While doing that, the data in flight will disappear. Since the notification is mission critical, missing a notification for us is not acceptable. I wonder if people in this mailing list have run into similar situation and would love to hear stories on how other people are addressing this concern. I then started to think may be an alternative is to aggregate the stream into bigquery and write a batch job that run every hour to consume it. Would this be a better alternative instead trying to solve every problem in streaming? Thanks Eric
