Aeden, I want to expand my answer after having re-read your question a bit more carefully.
For point 1 the behavior you are seeing is what is expected. With hadoop the metadata written by the job manager will literally include "_entropy_" in its path, while this will be replaced in paths of any and all checkpoint data files. With presto the metadata path won't include "_entropy_" at all (it will disappear, rather than being replaced by something specific). For point 2, I'm not sure. David On Thu, May 19, 2022 at 2:37 PM David Anderson <da...@nosredna.org> wrote: > This sounds like it could be FLINK-17359 [1]. What version of Flink are > you using? > > Another likely explanation arises from the fact that only the > checkpoint data files (the ones created and written by the task managers) > will have the _entropy_ replaced. The job manager does not inject entropy > into the path of the checkpoint metadata, so that it remains at a > predictable URI. Since Flink only writes keyed state larger than > state.storage.fs.memory-threshold into the checkpoint data files, and only > those files have entropy injected into their paths, if all of your state is > small it will all end up in the metadata file and you don't see any entropy > injection happening. See the comments on [2] for more on this. > > FWIW, I would urge you to use presto instead of hadoop for checkpointing > on S3. The performance of the hadoop "filesystem" is problematic when it's > used for checkpointing. > > Regards,, > David > > [1] https://issues.apache.org/jira/browse/FLINK-17359 > [2] https://issues.apache.org/jira/browse/FLINK-24878 > > On Wed, May 18, 2022 at 7:48 PM Aeden Jameson <aeden.jame...@gmail.com> > wrote: > >> I have checkpoints setup against s3 using the hadoop plugin. (I'll >> migrate to presto at some point) I've setup entropy injection per the >> documentation with >> >> state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints >> s3.entropy.key: _entropy_ >> >> I'm seeing some behavior that I don't quite understand. >> >> 1. The folder s3://my-bucket/_entropy_/my-job/checkpoints/... >> literally exists. Meaning that "_entropy_" has not been replaced. At >> the same time there are also a bunch of folders where "_entropy_" has >> been replaced. Is that to be expected? If so, would someone elaborate >> on why this is happening? >> >> 2. Should the paths in the checkpoints history tab in the FlinkUI >> display the path the key? With the current setup it is not. >> >> Thanks, >> Aeden >> >> GitHub: https://github.com/aedenj >> Linked In: http://www.linkedin.com/in/aedenjameson >> >