Re:yarn-session vs cluster per job for streaming jobs

Haibo Sun Wed, 17 Jul 2019 19:58:18 -0700

Hi, Maxim

For the concern talking on the first point：
If HA and checkpointing are enabled, AM (the application master, that is the
job manager you said) will be restarted by YARN after it dies, and then the
dispatcher will try to restore all the previously running jobs correctly. Note
that the number of attempts be decided by the configurations
"yarn.resourcemanager.am.max-attempts" and "yarn.application-attempts". The
obvious difference between the session and per-job modes is that if a fatal
error occurs on AM, it will affect all jobs running in it, while the per-job
mode will only affect one job.

You can look at this document to see how to configure HA for the Flink cluster
on YARN:
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/jobmanager_high_availability.html#yarn-cluster-high-availability
.

Best,
Haibo

At 2019-07-17 23:53:15, "Maxim Parkachov" <lazy.gop...@gmail.com> wrote:

Hi,

I'm looking for advice on how to run flink streaming jobs on Yarn cluster in
production environment. I tried in testing environment both approaches with HA
mode, namely yarn session + multiple jobs vs cluster per job, both seems to
work for my cases, with slight preference of yarn session mode to centrally
manage credentials. I'm looking to run about 10 streaming jobs mostly
reading/writing from kafka + cassandra with following restictions:
1. yarn nodes will be hard rebooted quite often, roughly every 2 weeks. I have
a concern here what happens when Job manager dies in session mode.

2. there are often network interruptions/slowdowns.
3. I'm trying to minimise time to restart job to have as much as possible
continious processing.

Thanks in advance,
Maxim.

Re:yarn-session vs cluster per job for streaming jobs

Reply via email to