You might be interested in the designs[1] from the [email protected] e-mail thread around this topic.
1: https://lists.apache.org/thread.html/3cfbd650a46327afc752a220b20a6081570000725c96541c21265e7b@%3Cdev.beam.apache.org%3E On Wed, Oct 4, 2017 at 3:06 PM, Yihua Fang <[email protected]> wrote: > Hi, > > I am new to the streaming world and trying to build a data pipeline that > would analyze a data stream every hour to output some notification. I would > love to hear you guy's advices and experiences with regarding to upgrading > and patching a streaming job. > > I started out the prototype to use window to aggregate the data and do > computation. However, this leads me to wonder what if in the future the > data pipeline needs to be significantly changed. Potentially, the old data > pipeline would need to be stopped and the new one launched. While doing > that, the data in flight will disappear. Since the notification is mission > critical, missing a notification for us is not acceptable. I wonder if > people in this mailing list have run into similar situation and would love > to hear stories on how other people are addressing this concern. > > I then started to think may be an alternative is to aggregate the stream > into bigquery and write a batch job that run every hour to consume it. > Would this be a better alternative instead trying to solve every problem in > streaming? > > Thanks > Eric >
