Thanks Congxian! To make sure I'm understanding correctly, if I retain 3 incremental checkpoints (say every minute), and I've just completed checkpoint 10, then anything in shared is from checkpoint 8 and 9 only. So anything older than ~3 minutes can safely be deleted? The state from checkpoint 5 doesn't live on in the shared directory - at all?
I ask because we have run into cases where we end up abandoning the state, and Flink does not clean up state from, say, a previous iteration of the job if you don't restore state. We need to remove these files automatically, but I want to be sure that I don't blow away older files in the shared dir from earlier, subsumed checkpoints - but you are saying that isn't possible, and that all subsumed checkpoints will have their /shared state rewritten or cleaned up as needed, correct? As for entropy, where would you suggest to use it? My understanding is that I don't control anything beyond the checkpoint directory, and since shared is in that directory I can't put entropy inside the shared directory itself (which is what I would need). Thanks, Trystan On Wed, May 6, 2020 at 7:31 PM Congxian Qiu <qcx978132...@gmail.com> wrote: > Hi > For the rate limit, could you please try entropy injection[1]. > For checkpoint, Flink will handle the file lifecycle(it will delete the > file if it will never be used in the future). The file in the checkpoint > will be there if the corresponding checkpoint is still valid. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/filesystems/s3.html#entropy-injection-for-s3-file-systems > Best, > Congxian > > > Trystan <entro...@gmail.com> 于2020年5月7日周四 上午2:46写道: > >> Hello! >> >> Recently we ran into an issue when checkpointing to S3. Because S3 >> ratelimits based on prefix, the /shared directory would get slammed and >> cause S3 throttling. There is no solution for this, because >> /job/checkpoint/:id/shared is all part of the prefix, and is limited to 3,500 >> PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix. >> >> (source: >> https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html >> ) >> >> Jobs sometimes also completely crash, and they leave state laying around >> when we bring the job up fresh. >> >> Our solutions have been to 1) reduce the number of taskmanagers 2) reduce >> the state.backend.rocksdb.checkpoint.transfer.thread.num back to 1 (we had >> increased it to speed up checkpointing/savepoint) and 3) manually delete >> tons of old files in the shared directory. >> >> My question: >> Can we safely apply a Lifecycle Policy to the directory/bucket to remove >> things? How long is stuff under /shared retained? Is it only for the >> duration of the oldest checkpoint, or could it carry forward, untouched, >> from the very first checkpoint to the very last? This shared checkpoint >> dir/prefix is currently limiting some scalability of our jobs. I don't >> believe the _entropy_ trick would help this, because the issue is >> ultimately that there's a single shared directory. >> >> Thank you! >> Trystan >> >