Hi Tobias, I think this can happen when there is a lot of backpressure on the pipeline.
Don't know if it's normal but i have a pipeline reading from KafkaIO and pushing to bigquery instreaming mode and i have seen checkpoint of almost 1gb and whenever i am doing a savepoint for updating the pipeline it can goes up to 8 GB of data on a savepoint. I am on Flink 1.5.x, on premises also using Rockdb and incremental. So far my only solutionto avoid errors while checkpointing or savepointing is to make sure the checkpoint Timeout is high enough like 20m or 30min. Kaymak, Tobias <[email protected]> schrieb am Di., 12. Feb. 2019, 17:33: > Hi, > > my Beam 2.10-SNAPSHOT pipeline has a KafkaIO as input and a BigQueryIO > configured with FILE_LOADS as output. What bothers me is that even if I > configure in my Flink 1.6 configuration > > state.backend: rocksdb > state.backend.incremental: true > > I see states that are as big as 230 MiB and checkpoint timeouts, or > checkpoints that take longer than 10 minutes to complete (I just saw one > that took longer than 30 minutes). > > Am I missing something? Is there some room for improvement? Should I use a > different storage backend for the checkpoints? (Currently they are stored > on GCS). > > Best, > Tobi >
