I forgot to mention that we uses hdfs as storage for checkpoint / savepoint.
Juan Carlos Garcia <[email protected]> schrieb am Di., 12. Feb. 2019, 18:03: > Hi Tobias, > > I think this can happen when there is a lot of backpressure on the > pipeline. > > Don't know if it's normal but i have a pipeline reading from KafkaIO and > pushing to bigquery instreaming mode and i have seen checkpoint of almost > 1gb and whenever i am doing a savepoint for updating the pipeline it can > goes up to 8 GB of data on a savepoint. > > I am on Flink 1.5.x, on premises also using Rockdb and incremental. > > So far my only solutionto avoid errors while checkpointing or savepointing > is to make sure the checkpoint Timeout is high enough like 20m or 30min. > > > Kaymak, Tobias <[email protected]> schrieb am Di., 12. Feb. 2019, > 17:33: > >> Hi, >> >> my Beam 2.10-SNAPSHOT pipeline has a KafkaIO as input and a BigQueryIO >> configured with FILE_LOADS as output. What bothers me is that even if I >> configure in my Flink 1.6 configuration >> >> state.backend: rocksdb >> state.backend.incremental: true >> >> I see states that are as big as 230 MiB and checkpoint timeouts, or >> checkpoints that take longer than 10 minutes to complete (I just saw one >> that took longer than 30 minutes). >> >> Am I missing something? Is there some room for improvement? Should I use >> a different storage backend for the checkpoints? (Currently they are stored >> on GCS). >> >> Best, >> Tobi >> >
