I am working on a POC High Availability installation of Flink on top of
Kubernetes with HDFS as a data storage location. I am not finding much
documentation on doing this, or I am finding the documentation in parts and
maybe getting it put together correctly. I think it falls between being an
HDFS thing and a Flink thing.

I am deploying to Kubernetes using the flink:1.7.0-hadoop27-scala_2.11
container off of docker hub.

I think these are the things I need to do
1) Setup an hdfs-site.xml file per
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Deployment
2) Set the HADOOP_CONF_DIR environment variable to the location of that
file per
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#hdfs
3) Create a flink-conf.yaml file that looks something like
        fs.default-scheme: hdfs://
        state.backend: rocksdb
        state.savepoints.dir: hdfs://flink/savepoints
        state.checkpoints.dir: hdfs://flink/checkpoints
4) Dance a little jig when it works.

Has anyone set this up? If so, am I missing anything?

-Steve

Reply via email to