I looked into snapshots and they do seem useful for providing a means to save state and resume, however they aren't as seamless as I was hoping for with the automatic checkpointing that is supported by other runners. It looked like snapshots would be user initiated and would pause the pipeline while the snapshot was being created. I could imagine how this would be set up on an automated schedule, but would still prefer something more light-weight like checkpoints.
On Mon, Aug 29, 2022 at 8:11 PM Reuven Lax <[email protected]> wrote: > > Google Cloud Dataflow does support snapshots. Is this what you were looking > for? > > On Mon, Aug 29, 2022 at 4:04 PM Kenneth Knowles <[email protected]> wrote: >> >> Hi Will, David, >> >> I think you'll find the best source of answer for this sort of question on >> the user@beam list. I've put that in the To: line with a BCC: to the >> dev@beam list so everyone knows they can find the thread there. If I have >> misunderstood, and your question has to do with building Beam itself, feel >> free to move it back. >> >> Kenn >> >> On Mon, Aug 29, 2022 at 2:24 PM Will Baker <[email protected]> wrote: >>> >>> Hello! >>> >>> I am wondering about using checkpoints with Beam running on Google >>> Cloud Dataflow. >>> >>> The docs indicate that checkpoints are not supported by Google Cloud >>> Dataflow: >>> https://beam.apache.org/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/ >>> >>> Is there a recommended approach to handling checkpointing on Google >>> Cloud Dataflow when using streaming sources like Kinesis and Kafka, so >>> that a pipeline could be resumed from where it left off if it needs to >>> be stopped or crashes for some reason? >>> >>> Thanks! >>> Will Baker
