I am working on a POC High Availability installation of Flink on top of Kubernetes with HDFS as a data storage location. I am not finding much documentation on doing this, or I am finding the documentation in parts and maybe getting it put together correctly. I think it falls between being an HDFS thing and a Flink thing.
I am deploying to Kubernetes using the flink:1.7.0-hadoop27-scala_2.11 container off of docker hub. I think these are the things I need to do 1) Setup an hdfs-site.xml file per https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Deployment 2) Set the HADOOP_CONF_DIR environment variable to the location of that file per https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#hdfs 3) Create a flink-conf.yaml file that looks something like fs.default-scheme: hdfs:// state.backend: rocksdb state.savepoints.dir: hdfs://flink/savepoints state.checkpoints.dir: hdfs://flink/checkpoints 4) Dance a little jig when it works. Has anyone set this up? If so, am I missing anything? -Steve