Hi,
I'm trying to enable HA for my Flink jobs running on AWS EMR.
Following [1], I created a common Flink YARN session and submitting all my
jobs to that one. These 4 config params were added
/ high-availability = zookeeper
high-availability.storageDir =
high-availability.zookepper.path.root = /flink
high-availability.zookeeper.quorum = <EMR's master node's DNS name>:2181
/(The Zookeeper came with EMR was used)
The command to start that Flink YARN session is like this:
`/flink-yarn-session -Dtaskmanager.memory.process.size=4g -nm
FlinkCommonSession -z FlinkCommonSession -d/`
The first HA test - yarn application killed - went well. I killed that
common session by using `/yarn application --kill <appId>/` and created a
new session using the same command, then the jobs were restored
automatically after that session was up.
However, the 2nd HA test - EMR cluster crashed - didn't work: the */jobs are
not restored/ *after the common session was created on the new EMR cluster.
(attached jobmanager.gz
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/jobmanager.gz>
)
Should I expect that the jobs are restored in that scenario no.2 - EMR
cluster crashed.
Do I miss something here?
Thanks for your help.
Regards,
Averell
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/