Hi Haibo,

thanks for tip, I almost forgot about max-attempts. I understood
implication of running with one AM.

Maybe my question was incorrect, but what would be faster (with regards to
downtime of each job):

1. In case of yarn-session: Parallel cancel all jobs with savepoints,
restart yarn-session, parallel start all jobs from savepoints
2. In case of per-job mode Parallel cancel all jobs with savepoints,
parallel start all jobs from savepoints.

I want to optimise standard situation where I deploy new version of all my
jobs. My current impression that job starts faster in yarn-session mode.

Thanks,
Maxim.


On Thu, Jul 18, 2019 at 4:57 AM Haibo Sun <sunhaib...@163.com> wrote:

> Hi, Maxim
>
> For the concern talking on the first point:
> If HA and checkpointing are enabled, AM (the application master, that is
> the job manager you said) will be restarted by YARN after it dies, and then
> the dispatcher will try to restore all the previously running jobs
> correctly. Note that the number of attempts be decided by the
> configurations "yarn.resourcemanager.am.max-attempts" and
> "yarn.application-attempts". The obvious difference between the session and
> per-job modes is that if a fatal error occurs on AM, it will affect all
> jobs running in it, while the per-job mode will only affect one job.
>
> You can look at this document to see how to configure HA for the Flink
> cluster on YARN:
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/jobmanager_high_availability.html#yarn-cluster-high-availability
>  .
>
> Best,
> Haibo
>
> At 2019-07-17 23:53:15, "Maxim Parkachov" <lazy.gop...@gmail.com> wrote:
>
> Hi,
>
> I'm looking for advice on how to run flink streaming jobs on Yarn cluster
> in production environment. I tried in testing environment both approaches
> with HA mode, namely yarn session + multiple jobs vs cluster per job, both
> seems to work for my cases, with slight preference of yarn session mode to
> centrally manage credentials. I'm looking to run about 10 streaming jobs
> mostly reading/writing from kafka + cassandra with following restictions:
> 1. yarn nodes will be hard rebooted quite often, roughly every 2 weeks. I
> have a concern here what happens when Job manager dies in session mode.
> 2. there are often network interruptions/slowdowns.
> 3. I'm trying to minimise time to restart job to have as much as possible
> continious processing.
>
> Thanks in advance,
> Maxim.
>
>

Reply via email to