Understanding RocksDBStateBackend in Flink on Yarn on AWS EMR

Sachin Mittal Thu, 21 Mar 2024 22:55:04 -0700

Hi,
We are using AWS EMR where we can submit our flink jobs to a long running
flink cluster on Yarn.


We wanted to configure RocksDBStateBackend as our state backend to store
our checkpoints.

So we have configured following properties in our flink-conf.yaml

   - state.backend.type: rocksdb
   - state.checkpoints.dir: file:///tmp
   - state.backend.incremental: true


My question here is regarding the checkpoint location: what is the
difference between the location if it is a local filesystem vs a hadoop
distributed file system (hdfs).

What advantages we get if we use:

*state.checkpoints.dir*: hdfs://namenode-host:port/flink-checkpoints
vs
*state.checkpoints.dir*: file:///tmp

Also if we decide to use HDFS then from where we can get the value for
*namenode-host:port*
given we are running Flink on an EMR.

Thanks
Sachin

Understanding RocksDBStateBackend in Flink on Yarn on AWS EMR

Reply via email to